How to set up an alert to trigger when F5 Pools ar...

davespatz · ‎01-28-2015

Hello,

We're load balancing Exchange servers behind our F5 LTM's. From time to time the Exchange services cycle due to maintenance reasons but the F5 LTM is marking the pool down when it happens thus generating a potential alert that we're not going to care about. My basic objective is to only alert if the pool has gone down (one syslog message) but has not come back up (another syslog message) within 5 minutes. >5min means something is really broken but <5min means probably no worries in our environment.

Sample Logs I Don't Want to Alert On (Downtime <5min)

Jan 28 06:40:03 example-ltm Jan 28 06:40:03 slot2/example-ltm notice mcpd[4907]: 01070638:5: Pool /Common/Exchange_ad_pool member /Common/exampleServer:4169 monitor status down. [ /Common/Exchange_ad_http_monitor: down ]  [ was up for 125hrs:23mins:32sec ]
Jan 28 06:40:22 example-ltm Jan 28 06:40:22 slot2/example-ltm notice mcpd[4907]: 01070727:5: Pool /Common/Exchange_ad_pool member /Common/exampleServer:4169 monitor status up. [ /Common/Exchange_ad_http_monitor: up ]  [ was down for 0hr:0min:19sec ]

Sample Logs I Do Want to Alert On (Downtime >5min)

Jan 28 08:32:23 example-ltm Jan 28 08:32:23 slot2/example-ltm notice mcpd[4907]: 01070638:5: Pool /Common/Exchange_ad_pool member /Common/exampleServer:4169 monitor status down. [ /Common/Exchange_ad_http_monitor: down ]  [ was node down for 0hr:0min:14sec ]
Jan 28 08:36:03 example-ltm Jan 28 08:50:03 slot2/example-ltm notice mcpd[4907]: 01070727:5: Pool /Common/Exchange_ad_pool member /Common/exampleServer:4169 monitor status up. [ /Common/Exchange_ad_http_monitor: up ]  [ was down for 0hr:3mins:40sec ]

It's important to note that I don't want to wait for the "up" syslog message. In other words, if I see a "down" message but not an "up" message within 5min, alert me. Since we have many pools, members and LTM's (hosts), I'm using the transaction command below to group the transactions by these but I want to start the transaction where it went down (did field extractions already for up/down status) and then complete it when up. Thing is I don't know how to complete it or alert me if I don't get an "up" message after 5min. For what it's worth, here's my search below

Splunk Sample Search (Incorrect)

tag=ltm "monitor status" NOT ("monitor status node" OR "Node \/Common*") | transaction f5_pool_name host f5_pool_member startswith(f5_member_status=down) endswith(f5_monitor_status=up) maxspan=5min keeporphans=true

Just want to get alerted if no "up" in the transaction in 5min.

woodcock · ‎06-26-2015

Try this:

tag=ltm "monitor status" NOT ("monitor status node" OR "Node \/Common*") | streamstats current=f last(_time) AS prevTime BY f5_pool_name host f5_pool_member | dedup f5_pool_name host f5_pool_member | eval spanSeconds=_time-prevTime | where (f5_member_status="down" AND (now()-_time) >=300) OR (f5_member_status="up" AND spanSeconds>=300)

davespatz · ‎06-26-2015

I'm running Splunk 6.2.1 - maybe this is why but getting the following below for the search:

Error in 'streamstats' command: The argument 'prev(_time)' is invalid.

woodcock · ‎06-26-2015

I had a typo; try it now.

davespatz · ‎06-27-2015

No syntax issue now but search comes back instantly with no results even over a 30 day period when usually it takes a bit of time to run.

woodcock · ‎06-27-2015

Perhaps a field name has a typo or something. Pull off everything after the last pipe, one by one, until you get sensible data and then go back the other way fixing the typo. The logic all looks good to me.

woodcock · ‎07-06-2015

Did you get this to work?

bdoherty77 · ‎06-26-2015

Any luck with this alert? We are trying to do the same thing, but we're getting tripped up on the search.

Thanks

davespatz · ‎06-26-2015

None yet. I tried working with support on this awhile back but was on an older version of Splunk but can't say for sure it was a bug or not. For w/e reason I just couldn't make the command work - probably my search string. If you ever get it to work reliably, let me know!!! I've had to move on to other things so didn't get back to TSing this anymore.

How to set up an alert to trigger when F5 Pools are down fore more than 5 minutes with the transaction command?

Detecting Remote Code Executions With the Splunk Threat Research Team

Observability | Use Synthetic Monitoring for Website Metadata Verification

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk