Alerting

How to set up an alert to trigger when F5 Pools are down fore more than 5 minutes with the transaction command?

davespatz
Explorer

Hello,

We're load balancing Exchange servers behind our F5 LTM's. From time to time the Exchange services cycle due to maintenance reasons but the F5 LTM is marking the pool down when it happens thus generating a potential alert that we're not going to care about. My basic objective is to only alert if the pool has gone down (one syslog message) but has not come back up (another syslog message) within 5 minutes. >5min means something is really broken but <5min means probably no worries in our environment.

Sample Logs I Don't Want to Alert On (Downtime <5min)

Jan 28 06:40:03 example-ltm Jan 28 06:40:03 slot2/example-ltm notice mcpd[4907]: 01070638:5: Pool /Common/Exchange_ad_pool member /Common/exampleServer:4169 monitor status down. [ /Common/Exchange_ad_http_monitor: down ]  [ was up for 125hrs:23mins:32sec ]
Jan 28 06:40:22 example-ltm Jan 28 06:40:22 slot2/example-ltm notice mcpd[4907]: 01070727:5: Pool /Common/Exchange_ad_pool member /Common/exampleServer:4169 monitor status up. [ /Common/Exchange_ad_http_monitor: up ]  [ was down for 0hr:0min:19sec ]

Sample Logs I Do Want to Alert On (Downtime >5min)

Jan 28 08:32:23 example-ltm Jan 28 08:32:23 slot2/example-ltm notice mcpd[4907]: 01070638:5: Pool /Common/Exchange_ad_pool member /Common/exampleServer:4169 monitor status down. [ /Common/Exchange_ad_http_monitor: down ]  [ was node down for 0hr:0min:14sec ]
Jan 28 08:36:03 example-ltm Jan 28 08:50:03 slot2/example-ltm notice mcpd[4907]: 01070727:5: Pool /Common/Exchange_ad_pool member /Common/exampleServer:4169 monitor status up. [ /Common/Exchange_ad_http_monitor: up ]  [ was down for 0hr:3mins:40sec ]

It's important to note that I don't want to wait for the "up" syslog message. In other words, if I see a "down" message but not an "up" message within 5min, alert me. Since we have many pools, members and LTM's (hosts), I'm using the transaction command below to group the transactions by these but I want to start the transaction where it went down (did field extractions already for up/down status) and then complete it when up. Thing is I don't know how to complete it or alert me if I don't get an "up" message after 5min. For what it's worth, here's my search below

Splunk Sample Search (Incorrect)

tag=ltm "monitor status" NOT ("monitor status node" OR "Node \/Common*") | transaction f5_pool_name host f5_pool_member startswith(f5_member_status=down) endswith(f5_monitor_status=up) maxspan=5min keeporphans=true

Just want to get alerted if no "up" in the transaction in 5min.

Tags (3)
0 Karma

woodcock
Esteemed Legend

Try this:

tag=ltm "monitor status" NOT ("monitor status node" OR "Node \/Common*") | streamstats current=f last(_time) AS prevTime BY f5_pool_name host f5_pool_member | dedup f5_pool_name host f5_pool_member | eval spanSeconds=_time-prevTime | where (f5_member_status="down" AND (now()-_time) >=300) OR (f5_member_status="up" AND spanSeconds>=300)
0 Karma

davespatz
Explorer

I'm running Splunk 6.2.1 - maybe this is why but getting the following below for the search:

Error in 'streamstats' command: The argument 'prev(_time)' is invalid.

0 Karma

woodcock
Esteemed Legend

I had a typo; try it now.

0 Karma

davespatz
Explorer

No syntax issue now but search comes back instantly with no results even over a 30 day period when usually it takes a bit of time to run.

0 Karma

woodcock
Esteemed Legend

Perhaps a field name has a typo or something. Pull off everything after the last pipe, one by one, until you get sensible data and then go back the other way fixing the typo. The logic all looks good to me.

0 Karma

woodcock
Esteemed Legend

Did you get this to work?

0 Karma

bdoherty77
Engager

Any luck with this alert? We are trying to do the same thing, but we're getting tripped up on the search.

Thanks

0 Karma

davespatz
Explorer

None yet. I tried working with support on this awhile back but was on an older version of Splunk but can't say for sure it was a bug or not. For w/e reason I just couldn't make the command work - probably my search string. If you ever get it to work reliably, let me know!!! I've had to move on to other things so didn't get back to TSing this anymore.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...