I am trying to get component availability from nagios data coming into splunk for the previous month. I'm ultimately trying to get the % of time available. I've developed the following search to calculate down time for a single host (hardware names have been changed to protect the innocent):
index=nagios src_host=blahn1* HOST ALERT HARD NOT SERVICE* host="blah.umich.edu"|sort _time|delta _time AS duration p=1|where name="UP"|stats sum(duration) as total_down_time|table total_down_time
This appears to be working correctly. Then if I can calculate total up time (I'd like to eventually exclude maintenance windows but they aren't in nagios yet so can't), I can get my percentage.
My questions are as follows:
index=nagios tag=mxhosts HOST ALERT HARD NOT SERVICE* host="blah.umich.edu"|sort src_host, _time|delta _time AS duration p=1|delta src_host AS hostdiff|where name="UP" and ~something with hostdiff~|stats sum(duration) as total_down_time by src_host
Since I can use a 'by' in the delta command, I was trying use the hostdiff field created in the search as a way to filter out durations when the src_host changed (i.e. previous event was from a different host so 'delta' shouldn't count.
Thanks
Hi 🙂
Please upgrade to the latest release of Splunk for Nagios and let me know how you go 🙂
http://apps.splunk.com/app/352/
There are a number of new dashboards including:
Livestatus Host SLA
Livestatus Service SLA
All the best,
Luke 🙂