Alerting

Alert that triggers based on errors per minute over time

karlduncans
Engager

I'm looking for a way to make an alert trigger only if a certain amount of events occur within a 3 minute period, per minute.

Right now:

index=stuff earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count by _time errormsg

Running every 3 minutes, and alert set to trigger if events > 5

Problem is, if there is a single minute spike of errors that exceeds 5 errors within that 3 minute search period, the alert will trigger. I'm looking for help in having the alert trigger only if each minute during that 3 minute period exceeds 5 errors.

Thanks in advance!

Tags (1)
0 Karma
1 Solution

chanfoli
Builder

If your threshold is total errors per minute based you can do something like this:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as TotalErrCount values(errmsg) by _time | where TotalErrCount>5

Then alert if you get 3 results. If you are wanting to total each specific message and look for cases where there is a specific errmsg with >5 for all 3 minutes then this should do it if you alert on getting 3 results, it would take some refinement to provide per message counts in the results, the goal here was to give 3 results only if at least one specific errmesg occurred more than 5 times in each minute:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as ErrCount by _time errmesg | where ErrCount>5 | stats values(errmesg) by _time

View solution in original post

chanfoli
Builder

If your threshold is total errors per minute based you can do something like this:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as TotalErrCount values(errmsg) by _time | where TotalErrCount>5

Then alert if you get 3 results. If you are wanting to total each specific message and look for cases where there is a specific errmsg with >5 for all 3 minutes then this should do it if you alert on getting 3 results, it would take some refinement to provide per message counts in the results, the goal here was to give 3 results only if at least one specific errmesg occurred more than 5 times in each minute:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as ErrCount by _time errmesg | where ErrCount>5 | stats values(errmesg) by _time

karlduncans
Engager

Your 2nd query is what i was looking for. Thank you!

0 Karma

chanfoli
Builder

So are you trying to alert when the overall count of errors exceeds 5 for all 3 minutes or do the specific different errormsg values need to factor in?

0 Karma

pradeepkumarg
Influencer

Does your search always results in 3 rows? If so, you can try something like this

index=stuff earliest=-4m@m latest=-1m@m 
| bucket span=1m _time
| stats count by _time errormsg | eval flag = if(count >5, 1, 0) | eventstats sum(flag) as total | search total = 3
0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...