Splunk Search

How to trigger an alert if a search is still returning results after 10 minutes?

jmpirro
New Member

Currently, we have a search that is set to trigger if it returns a single result, and then throttle for 10 minutes before going again.

We'd like to kind of do the opposite: If the search is STILL returning results (same host OR other hosts) after 10 minutes' time, THEN trigger an alert.

At the moment, this search returns alerts as soon as they happen, but sometimes it's a single alert and therefore a minor warning, and sometimes it's continuous (aka a service is actually down). We'd like to get Splunk to trigger if the same alert is still firing after 10 minutes, which usually indicates a problem with a particular host.

0 Karma

woodcock
Esteemed Legend

Build a search that uses timechart count with a span= that covers what would normally be your search's time window and then count up how many threshold crossings that you have with | where count>threshold | stats count | where count>10 and alert on that.

0 Karma

DalJeanis
Legend

Basically, you want to create a test that says the alert condition has not NOT been true for X minutes. Typically, the question is, "How to alert when my CPU has been over X% for Y minutes?"

Which is to say, "How do I know my CPU has been OVER X% for Y minutes and has NOT been UNDER X% for those Y minutes?"

The overall strategy is : create records for Y minutes or more back, at whatever frequency you think is reasonable for your use case, that have either a 1 or 0 for a field that means "the alert condition is true". Use streamstats to group them based on changes in that value. Finally, use eventstats to count the group and if the group is large enough (has enough minutes or seconds) to meet your criteria, then let the group through to throw the alert.

Here's a couple of those to review. The second one points to three more -

https://answers.splunk.com/answers/507811/how-to-edit-my-real-time-alert-to-trigger-when-ave.html
https://answers.splunk.com/answers/557838/create-an-alert-based-on-cpu-being-at-95-for-a-spa.html

0 Karma

somesoni2
Revered Legend

You can select like last 10 or 15 min and calculate the duration of first alert event to last/latest alert event and alert when duration is 10 mins.

0 Karma

jmpirro
New Member

The event itself is brief, I just need to know how many times it's fired over the last 10 minutes (and if it's still going). Most of what I'm finding calculates event duration as opposed to difference between events firing.

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...