Splunk Search

How to trigger an alert if a search is still returning results after 10 minutes?

jmpirro
New Member

Currently, we have a search that is set to trigger if it returns a single result, and then throttle for 10 minutes before going again.

We'd like to kind of do the opposite: If the search is STILL returning results (same host OR other hosts) after 10 minutes' time, THEN trigger an alert.

At the moment, this search returns alerts as soon as they happen, but sometimes it's a single alert and therefore a minor warning, and sometimes it's continuous (aka a service is actually down). We'd like to get Splunk to trigger if the same alert is still firing after 10 minutes, which usually indicates a problem with a particular host.

0 Karma

woodcock
Esteemed Legend

Build a search that uses timechart count with a span= that covers what would normally be your search's time window and then count up how many threshold crossings that you have with | where count>threshold | stats count | where count>10 and alert on that.

0 Karma

DalJeanis
Legend

Basically, you want to create a test that says the alert condition has not NOT been true for X minutes. Typically, the question is, "How to alert when my CPU has been over X% for Y minutes?"

Which is to say, "How do I know my CPU has been OVER X% for Y minutes and has NOT been UNDER X% for those Y minutes?"

The overall strategy is : create records for Y minutes or more back, at whatever frequency you think is reasonable for your use case, that have either a 1 or 0 for a field that means "the alert condition is true". Use streamstats to group them based on changes in that value. Finally, use eventstats to count the group and if the group is large enough (has enough minutes or seconds) to meet your criteria, then let the group through to throw the alert.

Here's a couple of those to review. The second one points to three more -

https://answers.splunk.com/answers/507811/how-to-edit-my-real-time-alert-to-trigger-when-ave.html
https://answers.splunk.com/answers/557838/create-an-alert-based-on-cpu-being-at-95-for-a-spa.html

0 Karma

somesoni2
Revered Legend

You can select like last 10 or 15 min and calculate the duration of first alert event to last/latest alert event and alert when duration is 10 mins.

0 Karma

jmpirro
New Member

The event itself is brief, I just need to know how many times it's fired over the last 10 minutes (and if it's still going). Most of what I'm finding calculates event duration as opposed to difference between events firing.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...