Alerting

Rapid growth of values, then alerting on it.

tkrn
Engager

Recently, we implemented AlertThrottle which is a terrific little app which limits (in our case) the amount of emails if something is past a particular value. The second half of our task is to identify events that occur below a threshold but is a significant jump in value.

Example: Disk is 20% full, it jumps to 50% in the matter of an hour but doesn't trigger the disk alert which is set at 85%.

I am looking for guidance or any suggestions of how to go about this. Basically we are comparing two values over a period of time and if it exceeds a limit within that moving time frame an alert is triggered.

Tags (2)
1 Solution

ftk
Motivator

You can calculate the percent difference over your time range and then alert if the difference is higher than your threshold amount. For example, let's say you want to know the % difference of the disk_space field:

your search | stats range(disk_space) as difference list(disk_space) as list | streamstats max(list) as maxSelect window=1 | eval percent_difference=((difference/maxSelect)*100)

Then have an alert condition that hits when percent_difference > 30 for a 30% increase alert.

View solution in original post

oscarminassian
Path Finder

This is really great @ftk! I re-purposed it for a SQL replication alert that is often very very spiky (Values from 0 up 25 000 and back to 0 in 3-minute span). I changed the logic a little to ensure we had an actual problem for a set period of time. The following search is for 5 data points over a 5min time frame.

index=sql source=blah sourcetype=sp_pendingcmds
| where pendingcmdcount>= 10000
| stats range(pendingcmdcount) as difference list(pendingcmdcount) as list 
| streamstats latest(list) as maxSelect window=1, count(list) as listcount 
| where listcount>=5
| eval percent_difference=((difference/maxSelect)*100)

I then set the alert to check percent_difference > 50

It's working a treat. I hope it helps someone else

0 Karma

ftk
Motivator

You can calculate the percent difference over your time range and then alert if the difference is higher than your threshold amount. For example, let's say you want to know the % difference of the disk_space field:

your search | stats range(disk_space) as difference list(disk_space) as list | streamstats max(list) as maxSelect window=1 | eval percent_difference=((difference/maxSelect)*100)

Then have an alert condition that hits when percent_difference > 30 for a 30% increase alert.

richcollier
Path Finder

This is pretty good, however you are still specifying a fixed threshold for the alert condition (in this case, 30%). How do you know if 30% is the right choice?

What's more effective is to use an anomaly detection approach to determine if the current data is statistically outside of the likelihood of occurrence based upon observed past behavior/values. That type of analysis is inherently hard to do on your own, but there's an app called Prelert Anomaly Detective that will do it for you!

0 Karma

ftk
Motivator

I imagine you could do your stats on a by server basis. | stats range(disk_space) as difference list(disk_space) by host

0 Karma

muebel
SplunkTrust
SplunkTrust

this works for one host-filesystem pair but falls apart when the search contains results from many different filesystems/hosts. Is there anyway to account for that besides limiting the search to one host/filesystem?

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...