Hey guys,
Trying to set up an alert that will send an email when an interface goes down but does not come up within a certain timeframe. I'm assuming 10-15 minutes should suffice. We're having an issue when running updates or rebooting a server, the interface does not come up properly sometimes.
This was a test run of what the logs would look like searching for USPK10OLLBS01 and /Common/tcp:
I'm pretty bad with the searching logic so I could really use some help! Thanks much, these are the logs i'm working with below during a test down state on one of the interfaces. It reports a couple ups, but only one down.
2/3/15
10:29:00.000 AM
Feb 3 10:29:00 10.10.0.19 Feb 3 10:29:06 uspk10ollbs01 notice mcpd[6642]: 01070727:5: Pool /Common/UAT-BTS-Batch member /Common/USPK10OLBTSBA02:80 monitor status up. [ /Common/tcp: up ] [ was node down for 0hr:0min:3sec ]
host = 10.10.0.19 source = udp:514 sourcetype = syslog
2/3/15
10:28:57.000 AM
Feb 3 10:28:57 10.10.0.19 Feb 3 10:29:03 uspk10ollbs01 notice mcpd[6642]: 01070638:5: Pool /Common/UAT-BTS-Batch member /Common/USPK10OLBTSBA02:80 monitor status node down. [ /Common/tcp: up ] [ was down for 0hr:0min:16sec ]
host = 10.10.0.19 source = udp:514 sourcetype = syslog
2/3/15
10:28:41.000 AM
Feb 3 10:28:41 10.10.0.19 Feb 3 10:28:47 uspk10ollbs01 notice mcpd[6642]: 01070638:5: Pool /Common/UAT-BTS-Batch member /Common/USPK10OLBTSBA02:80 monitor status down. [ /Common/tcp: down ] [ was up for 856hrs:4mins:2sec ]
Try something like this (assuming host name is NOT extracted. Remove the first regex for host if its extracted)
your base search | rex "(?<HostName>\w+)\snotice.*was down for (?<hour>\d+)hrs\:(?<minute>\d+)mins\:(?<second>\d+)sec\s*\]" | eval Downtime=round((hour*3600 + minute*60 + second)/60,2) | where Downtime>15
You can schedule this search and setup alert.
http://docs.splunk.com/Documentation/Splunk/6.2.1/Alert/Setupalertactions#Configure_email_notificati...