I have a saved search that notifies me when a forwarder goes up or down based on various TcpInputProc
and TcpOutputProc
messages coming from both the indexer and the forwarder machines.
The problem I'm running into is that I'm seeing a bunch of messages like this even when the forwarder is not going down. Anybody know why this is happening, or a more reliable message that I can use for this:
05-26-2010 11:24:18.346 INFO TcpInputProc - Hostname=host.domain.com closed connection
I'm wondering if this simply means that the connection was temporarily closed due to no data, but that would seem odd since I'm seeing this primarily on a few servers that are fairly busy.
For anyone interested. My full search runs ever 5 minutes, and looks like this: (Be prepared to do some scrolling)
index=_internal sourcetype="splunkd" (TcpInputProc "closed connection" OR "Connection accepted from") NOT localhost | eval sender=if(searchmatch("TcpOutputProc"),host,"") | eval receiver=if(searchmatch("TcpInputProc"),host,"") | eval action=if(searchmatch("Connect* accepted OR to"),"up", "down") | eval sender=coalesce(Hostname,sender) | rex "to (?<receiver>[^:]+)(:\d+)?" | rex "from (?<sender>\S+)" | replace "dnsname.example.com" with "splunk.domain.com", "anotherdnsname.domain.com" with "therealservername.domain.com" in sender, receiver | stats min(_time) as start_time, max(_time) as end_time, list(action) as actions, first(action) as final_state by sender,receiver | eval start_time=strftime(start_time,"%I:%M %p") | eval end_time=strftime(end_time,"%I:%M %p")
I recommend using the hosts metadata and searching for events received. Metadata contains when the last event was received from a specific host, source, or sourcetype. You can use a where statement that compares the last time an event was received to ensure that data is streaming. The reasons this is better than searching the splunkd log:
The search I use is as follows:
| metadata type=hosts | eval diff=now()-recentTime | where diff < 600 | convert ctime(*Time)
This will tell you what hosts have sent data in the past 10 minutes.
I recommend using the hosts metadata and searching for events received. Metadata contains when the last event was received from a specific host, source, or sourcetype. You can use a where statement that compares the last time an event was received to ensure that data is streaming. The reasons this is better than searching the splunkd log:
The search I use is as follows:
| metadata type=hosts | eval diff=now()-recentTime | where diff < 600 | convert ctime(*Time)
This will tell you what hosts have sent data in the past 10 minutes.