Hi Guys,
I could really use an ongoing alert that catches a sudden rise (spike) in a certain error code (such as 404 or 502 etc...)
I tried giving this some thought on how to achieve that, and... Well... I could really use your help 🙂
From my understanding the search query should "know" or, "sense" the normal traffic (not sure for how long, maybe for 1hr, 2hrs) and alert when there is a spike in the error code compared to 1-2 hours ago.
I think the error code spike threshold should be more than 5% of total traffic, while occurring for longer than 90 seconds.
I appreciate your help.
I use predictions when I create alerts by statistical analysis. I think it is easier to adjust the prediction parameters according to the current situation rather than thinking about various logic.
index=(your index) ("404" OR "502" OR ・・・)
| timechart span=90s count
| predict lower95=lower upper95=upper algorithm=LL count as predict
| where count>'upper(predict)'
※Adjustment point:span=90s、upper95、time range、(algorithm)
Hi @gingersoftware
My name is Anam Siddique and I am the Community Content Specialist for Splunk Answers. Please accept the appropriate answer that worked for you so other members of the community can benefit from it. If none of the answers have worked for you so far please post further comments so someone can help you.
Thanks
Timewrap will do the trick.
Check out this INCREDIBLE answer by @mmodestino here:
https://answers.splunk.com/answers/511894/how-to-use-the-timewrap-command-and-set-an-alert-f.html
I heard that he was going to create a blog post or app based on this, what is the evolution of this answer, @mmodestino?
I use predictions when I create alerts by statistical analysis. I think it is easier to adjust the prediction parameters according to the current situation rather than thinking about various logic.
index=(your index) ("404" OR "502" OR ・・・)
| timechart span=90s count
| predict lower95=lower upper95=upper algorithm=LL count as predict
| where count>'upper(predict)'
※Adjustment point:span=90s、upper95、time range、(algorithm)
Thanks,
Could you help me modify this script to fit your description?
tag=NginxLogs host=www1 OR host=www2 |stats count by status|eventstats sum(count) as total|eval perc=round((count/total)*100,2)|where status="404" AND perc>5
Thanks
predictions are the way to go.
For example, it is like this.
tag=NginxLogs host=www1 OR host=www2
|timechart span=1h count as total,count(eval(status="401")) as count
|eval perc=round((count/total)*100,2)
|fields - count,total
|predict lower95=lower upper95=upper algorithm=LL perc as predict
|where perc>'upper(predict)'
As it is a sample, please change the parameters in the actual environment and try it.
If you delete the WHERE clause, you can check it on the graph.