Splunk Search

How to trigger an alert if a search is still returning results after 10 minutes?

jmpirro
New Member

Currently, we have a search that is set to trigger if it returns a single result, and then throttle for 10 minutes before going again.

We'd like to kind of do the opposite: If the search is STILL returning results (same host OR other hosts) after 10 minutes' time, THEN trigger an alert.

At the moment, this search returns alerts as soon as they happen, but sometimes it's a single alert and therefore a minor warning, and sometimes it's continuous (aka a service is actually down). We'd like to get Splunk to trigger if the same alert is still firing after 10 minutes, which usually indicates a problem with a particular host.

0 Karma

woodcock
Esteemed Legend

Build a search that uses timechart count with a span= that covers what would normally be your search's time window and then count up how many threshold crossings that you have with | where count>threshold | stats count | where count>10 and alert on that.

0 Karma

DalJeanis
Legend

Basically, you want to create a test that says the alert condition has not NOT been true for X minutes. Typically, the question is, "How to alert when my CPU has been over X% for Y minutes?"

Which is to say, "How do I know my CPU has been OVER X% for Y minutes and has NOT been UNDER X% for those Y minutes?"

The overall strategy is : create records for Y minutes or more back, at whatever frequency you think is reasonable for your use case, that have either a 1 or 0 for a field that means "the alert condition is true". Use streamstats to group them based on changes in that value. Finally, use eventstats to count the group and if the group is large enough (has enough minutes or seconds) to meet your criteria, then let the group through to throw the alert.

Here's a couple of those to review. The second one points to three more -

https://answers.splunk.com/answers/507811/how-to-edit-my-real-time-alert-to-trigger-when-ave.html
https://answers.splunk.com/answers/557838/create-an-alert-based-on-cpu-being-at-95-for-a-spa.html

0 Karma

somesoni2
Revered Legend

You can select like last 10 or 15 min and calculate the duration of first alert event to last/latest alert event and alert when duration is 10 mins.

0 Karma

jmpirro
New Member

The event itself is brief, I just need to know how many times it's fired over the last 10 minutes (and if it's still going). Most of what I'm finding calculates event duration as opposed to difference between events firing.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...