Splunk Enterprise

Alert when a host is down for more than 90 seconds

jlimlogic
New Member

We need to setup an alert when a host has been down for 90 seconds or more.

Log examples:

8:01:00 ServerA is DOWN
8:03:00 ServerA is UP
8:04:00 Server B is DOWN
8:05:00 Server B is UP

The above example shows 2 sets of up/down log entries for ServerA and ServerB. ServerA was down for 2 minutes which is greater than 90 seconds. ServerB was only down for 60 seconds. We'd like to get alerted for ServerA either in realtime or ASAP once it reaches the 90 seconds threshold but not ServerB as it was only down for a minute.

Conversely, we'd also need to be alerted when ServerA comes back up.

Tags (1)
0 Karma

jlimlogic
New Member

Actually, we need to receive the alert BEFORE the "8:03:00 ServerA is UP" event record is logged and indexed into Splunk.. So the log will look more like this:

8:01:00 ServerA is DOWN
8:04:00 Server B is DOWN
8:05:00 Server B is UP

Essentially, once the current time is 8:02:31, Splunk should be alerting us that ServerA has been down for at least 90 seconds.

Would a TRANSACTION be more appropriate for this - perhaps a query based on an OPEN transaction? If so, how would we schedule the alert so that we do not miss any events?

0 Karma

somesoni2
Revered Legend

Try something like this (assuming _time field value corresponds to the time in the event that you want to use. Using regular scheduled search)

Search:

your base search  earliest=-6m@m latest=-1m@m | rex "^\S+(?<Server>\w+) is (?<Status>\w+)" 
| sort Server _time | streamstats current=f window=1  values(Status) as prev_Status
| where isnull(prev_Status) OR Status!=prev_Status
| streamstats current=f window=1 values(_time) as prev_time values(Status) as prev_Status
| eval StatusDuration=_time-prev_time | where prev_Status=Down AND StatusDuration>90

Cron: 1-59/5 * * * * (which is every 5 mins starting with min 1: 1,6,11,16....)
Alert condition: when number of events > 0

Agree with @mwdbhyat on not running a real-time search unless it's mission critical.

0 Karma

jlimlogic
New Member

Actually, we need to receive the alert BEFORE the "8:03:00 ServerA is UP" event record is logged and indexed into Splunk.. So the log will look more like this:

8:01:00 ServerA is DOWN
8:04:00 Server B is DOWN
8:05:00 Server B is UP

Essentially, once the current time is 8:02:31, Splunk should be alerting us that ServerA has been down for at least 90 seconds.

Would a TRANSACTION be more appropriate for this - perhaps a query based on an OPEN transaction? If so, how would we schedule the alert so that we do not miss any events?

0 Karma

mwdbhyat
Builder

I would check out these docs : http://docs.splunk.com/Documentation/Splunk/6.5.0/Alert/Alertexamples

Realtime search may be taxing on resources depending on how many you have going.. I wouldnt recommend using it unless its completely necessary. You could check how long the search takes to complete - if you set it to run every 60s and the search only takes 10s then you are still in your window.

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...