Alerting

calculate avg value over time - alert if 200% increase

sonicZ
Contributor

Hi,

I am trying to track a value on a backend server if a certain operation spikes to greater then 200% of the average value per 5 minutes, not sure how to do the alert part unless i enter a static value like this, and alert on the eval "high" value.

index="vip" host=ship*be* OR host=van*be* OPERATION="Validate" source="/app/logs/vipservices/vipservices.log" earliest=-5m | timechart span=5m count by host | eval BE_spike = if( count > 2000, "high", "normal")

what's the best way to schedule an alert if the OPERATION=Validate avg spikes higher then 200% of the previous values over time?

Tags (2)
1 Solution

lguinn2
Legend

Try this:

index="vip" host=ship*be* OR host=van*be* OPERATION="Validate" 
source="/app/logs/vipservices/vipservices.log" earliest=-5m 
| stats count as Last5Minutes by host
| join host [ search index="vip" host=ship*be* OR host=van*be* OPERATION="Validate" 
    source="/app/logs/vipservices/vipservices.log" earliest=-30d latest=-5m
    | bucket span=5m _time
    | stats count by host 
    | stats avg(count) as Average by host ]
| where Last5Minutes > Average
| table host Last5Minutes Average

And set the alert to trigger when the number of results is greater than zero.

Test it by removing the where command. Also, I updated this after I realized that the original (using timechart) wasn't working properly.

View solution in original post

dwaddle
SplunkTrust
SplunkTrust

There is video from a presentation by Jesse Trucks at a recent Splunk Live which covers just about this exact same topic. Watch it at https://vimeo.com/66779015

lguinn2
Legend

Try this:

index="vip" host=ship*be* OR host=van*be* OPERATION="Validate" 
source="/app/logs/vipservices/vipservices.log" earliest=-5m 
| stats count as Last5Minutes by host
| join host [ search index="vip" host=ship*be* OR host=van*be* OPERATION="Validate" 
    source="/app/logs/vipservices/vipservices.log" earliest=-30d latest=-5m
    | bucket span=5m _time
    | stats count by host 
    | stats avg(count) as Average by host ]
| where Last5Minutes > Average
| table host Last5Minutes Average

And set the alert to trigger when the number of results is greater than zero.

Test it by removing the where command. Also, I updated this after I realized that the original (using timechart) wasn't working properly.

sonicZ
Contributor

I have another request on this answer, what if i want to do the same query but compare the last 5 minutes vs the last 12 / 24 hours? I am messing around with spans and dividing the avg(count) ..Math is hard 🙂

0 Karma

sonicZ
Contributor

Awesome this works, thanks again

0 Karma

lguinn2
Legend

all I did was use eval to create a new variable called orig_host in the first search. You could also use rename

0 Karma

lguinn2
Legend

index="vip" host=ship*be* OR host=van*be* OPERATION="Validate"
source="/app/logs/vipservices/vipservices.log" earliest=-5m
| stats count as Last5Minutes by host
| eval orig_host = host
| join orig_host
[ search index=summary_vip orig_host=ship*be* OR orig_host=van*be* OP="Validate"
source="VIP Operations by Host Summary Index Search 5 Min" earliest=-15m latest=-5m
| bucket span=5m _time
| stats count by orig_host
| stats avg(count) as Average by orig_host ]
| eval doubleAVG=(2*Average)
| where Last5Minutes > doubleAVG
| table orig_host Last5Minutes Average doubleAVG

0 Karma

sonicZ
Contributor
index="vip" host=ship*be* OR host=van*be* OPERATION="Validate" 
source="/app/logs/vipservices/vipservices.log" earliest=-5m 
| stats count as Last5Minutes by host
| join host, orig_host 
[ search index=summary_vip  orig_host=ship*be* OR orig_host=van*be* OP="Validate" 
source="VIP Operations by Host Summary Index Search 5 Min" earliest=-15m latest=-5m 
| bucket span=5m _time  
| stats count by orig_host  
| stats avg(count) as Average by orig_host ] 
| eval doubleAVG=(2*Average) 
| where Last5Minutes > doubleAVG
| table orig_host Last5Minutes Average doubleAVG
0 Karma

sonicZ
Contributor

Hi Lisa,
So based on your answer i think i am getting close...i already have a previous saved search gathering some validate operations in a summary index.
The problem is i cant do a join on orig_host to host because the summary index stores hosts as orig_host and comparing to regular vip index uses host, know any workarounds?

0 Karma

sonicZ
Contributor

Lisa i tried using the subsearch you posted above, lowering the
"earliest=-1h" returns average values on the 8 hosts are around 10k average per host
"earliest=-4h" returns average values on the 8 hosts are around 40-70k average per host
"earliest=-6h" returns average values on the 8 hosts are around 70-90k average per host

The earliest =-30d would take way too long to finish.
If i just want to run it every 5 minutes should i make earliest=-10m latest=-5m how would i make the "where Last5Minutes > Average" only alert if 200% of average is reached?

0 Karma

sonicZ
Contributor

Lisa, this definitely returns results, 8k or so within -5m

0 Karma

lguinn2
Legend

What does this return

index="vip" host=ship*be* OR host=van*be* OPERATION="Validate"
source="/app/logs/vipservices/vipservices.log" earliest=-5m

and note that I have updated my answer above!

0 Karma

sonicZ
Contributor

Lisa, thanks could not get the above search to return results, i tried lowering the earliest to earliest=-1h but still not getting results with even the subsearch.

I'll try again monday.

0 Karma

sonicZ
Contributor
index="vip" host=ship*be* OR host=van*be* operation=Validate source="/app/logs/vipservices/vipservices.log" | timechart count span=1m | streamstats window=20 avg(count) as avgCount | fields _time avgCount

Or

index="vip" host=ship*be* OR host=van*be* operation=Validate source="/app/logs/vipservices/vipservices.log" | timechart span=1m avg(count) as avgcount |  bucket _time span=1m
| stats count by _time
| stats avg(count) as AverageCount | streamstats avg(AverageCount) as Strm_AverageCount

Getting the averages, but failing to compare to previous values over time.

0 Karma
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...