Hey There,
I am new to splunk(Please go easy on my knowledge :)). We have 150 servers that has splunk forwarders on it. We want to check the status of the forwarders(stopped/running) on a regular basis. I know there's a topic around this (check if hosts are sending any events. if no, forwarder isn't running). Big question, how can i be sure that it's forwarder problem and not the host itself? If you can provide a sample search, that'd be great!. Thank you for you time.
Regards,
Raghav
There is no way in Splunk to tell if a host is up or running. As you pointed out, there are lots of searches to tell if the Splunk forwarder on a host is communicating to the indexers - but not if the host itself is up/down. Here is an example:
Query/Alert to detect if a forwarder stops reporting...
HOWEVER, you could write a script (for Linux, Windows or Python) that runs every few minutes and tests for the status of hosts. For example, the script could output a line for each host like this:
<timestamp> hostname=xyz status=up
<timestamp> hostname=abc status=down
And then you could have Splunk monitor this file and use it to report the status of hosts.
try this
| metadata type=hosts
| eval lastHour=relative_time(now(),"-1h@h")
| eval yesterday=relative_time(now(), "-1d@d")
| where ( recentTime>yesterday AND recentTime
This is the one which we are using currently.
index=_internal source=*metrics.log group=tcpin_connections earliest=-2d@d
| eval Host=coalesce(hostname, sourceHost)
| eval age = (now() - _time )
| stats min(age) as age, max(_time) as LastTime by Host
| convert ctime(LastTime) as "Last Active On"
| eval Status= case(age < 1800,"Running",age > 1800,"DOWN") | rename age as Age
| sort Status | table Host, Status, Age, "Last Active On"
Criteria to define if a forwarder is Running or Down is that if no heartbeat is received for 30 min its Down and running otherwise
Hi Somesh,
I got some inconsistent result where i saw min(age) doesn't give proper timings. I have replaced it with latest i think that gives proper results. What do you think?
| stats latest(age) as age, max(_time) as LastTime by Host
| convert ctime(LastTime) as "Last Active On"
| eval Status= case(age < 100,"Running",age > 900,"DOWN") | rename age as Age |eval Hour=round(Age/3600,0)|eval Minute=round((Age%3600)/60,0)|eval Age="-".Hour."h"." : ".Minute."m"
Have not you tried Deployment Monitor App? It's the easiest way to know if forwarder is running or not. Set an alert if some forwarder is stopped sending data. It also provides missing sourcetype,source, indexing status...
There you go
Splunk Deployment Monitor
Thanks
In newest version of splunk the use of Splunk Deployment Monitor has been deprecated.
Suggested is to use Splunk Deployment Monitor instead.
Yes I meant DMC http://docs.splunk.com/Documentation/Splunk/latest/DMC/DMCoverview
Thanks, C.
@Cesaredf - I think you mean the Distributed Management Console (DMC). In Splunk 6.3, the DMC can track forwarders and report if a forwarder goes "missing."
We are in the process 🙂 Thank you!
In a nutshell, you need to search both for forwarders and for the hosts. Then you can determine if it's a host problem or a forwarder problem.
Here is the dashboard panel I use for this:
<module name="HiddenSearch" layoutPanel="panel_row5_col1" autoRun="True">
<!-- Find and report on all Splunk Universal Forwarders and endpoints not running SUF. Skip IPs in the SUFExceptions file. -->
<param name="search"><![CDATA[index=_internal source="/opt/splunk/var/log/splunk/metrics.log*" sourcetype="splunkd" fwdType="*" |
dedup sourceHost | rename IPAddress AS hostip, sourceHost AS IPAddress, OS AS fOS |
fields IPAddress, hostname, fGUID, fOS, fwdType | append [loadjob savedsearch="my:app:HWDetailBase" |
rename OS AS hOS | fields IPAddress, ComputerName, hOS] |
transaction IPAddress |
eval HostName=coalesce(ComputerName, hostname) | eval OS=coalesce(hOS, fOS) |
eval "Forwarder State"=if(isnotnull(fwdType),"Running","NOT RUNNING") |
search [|inputlookup SUFExceptions.csv append=f| fields IPAddress |format "NOT (" "(" "" ")" "OR" ")"] |
sort "Forwarder State" | table IPAddress, HostName, OS, "Forwarder State"
]]></param>
<param name="groupLabel">Forwarder Status</param>
<module name="JobProgressIndicator"></module>
<param name="earliest">-24h</param>
<param name="latest">now</param>
<module name="PostProcess" layoutPanel="panel_row5_col1">
<param name="search"> | rename "Forwarder State" AS fState |
stats count(eval(fState=="NOT RUNNING")) AS nRun</param>
<module name="HTML" layoutPanel="panel_row5_col1">
<param name="html"><![CDATA[
<table>
<tr><td>Hosts:</td><td width=3></td><td>$results.resultCount$</td><td width=8></td><td>Not running:</td><td width=3></td><td>$results[0].nRun$</td></tr>
</table>
]]></param>
</module>
</module>
The SUFExceptions.csv file contains a single field, IPAddress, and is where I put hosts I know aren't running a forwarder. It saves modifying a lengthy where
clause every time there's a change to the exception list.
The HWDetailBase search is a bit too long to list here, but it essentially combines all of our sources of host information (such as port_scan) and returns IPAddress, ComputerName, and OS fields.
Thank you Rich! will try and keep you posted. I appreciate your time and help.
Thanks,
Raghav
There is no way in Splunk to tell if a host is up or running. As you pointed out, there are lots of searches to tell if the Splunk forwarder on a host is communicating to the indexers - but not if the host itself is up/down. Here is an example:
Query/Alert to detect if a forwarder stops reporting...
HOWEVER, you could write a script (for Linux, Windows or Python) that runs every few minutes and tests for the status of hosts. For example, the script could output a line for each host like this:
<timestamp> hostname=xyz status=up
<timestamp> hostname=abc status=down
And then you could have Splunk monitor this file and use it to report the status of hosts.
It Worked!!!! Awesome
I was just curious if you would be willing to share the script you wrote?
@Raghav2384 could you please share script or suggest something on this to me and @mmensch
thanks in advance
Thank you Iguinn! i will try thee method you posted. I appreciate your time & help.
Thanks,
Raghav