How to calc component availabilty data coming from...

auntyem · ‎09-10-2012

I am trying to get component availability from nagios data coming into splunk for the previous month. I'm ultimately trying to get the % of time available. I've developed the following search to calculate down time for a single host (hardware names have been changed to protect the innocent):

This appears to be working correctly. Then if I can calculate total up time (I'd like to eventually exclude maintenance windows but they aren't in nagios yet so can't), I can get my percentage.

My questions are as follows:

I'm sure i"m not the first person to do this. Is there a better way?
If not, I can't find an easy way to calculate the uptime for the month by looking at the timespan ofhte search. Again, I'm sure I'm missing something there.
My search above is for a single host (src_host) in nagios. I'd like to do this on a group of hosts, using tags to get the group. I've tried the following to do this but it isn't quite working, though it's close:

Since I can use a 'by' in the delta command, I was trying use the hostdiff field created in the search as a way to filter out durations when the src_host changed (i.e. previous event was from a different host so 'delta' shouldn't count.

we have four nagios hosts monitoring everything. I'd like to just pick one, but if there's a netsplit that makes a host register an 'outage', I don't want that to be the one host. I somehow want the 'best' host at any given poitn in time. Any suggestions here?

Thanks

lukeh · ‎10-15-2013

Hi 🙂

Please upgrade to the latest release of Splunk for Nagios and let me know how you go 🙂

http://apps.splunk.com/app/352/

There are a number of new dashboards including:

Livestatus Host SLA

Livestatus Service SLA

All the best,

Luke 🙂

How to calc component availabilty data coming from nagios

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!