Alerting

Find trigger when http error code increase 5% for 3 consecutive minutes

mlui_2
Explorer

Hi guys

how do create an alert trigger where the follow criteria

Error Status code 5% increase for 3 consecutive minutes report as "Warning". 5% increase for 5 consecutive minutes report as "Error"

base search is something like

index=apacheaccesslogs | fields status | timechart span=1m count by status

Thanks in advance

Tags (2)
0 Karma
1 Solution

jacobpevans
Motivator

Greetings @mlui_2,

Please take a look at these run-anywhere searches. This sounds perfect for the transpose (https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transpose) command. Yours will still use count instead of sum(count)

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=102 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=103 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=104 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=106 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", (100*('row 4' - 'row 1') / 'row 1'),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", (100*('row 6' - 'row 1') / 'row 1'),"N/A")

And here's what the full alert would look like:

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=110 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=115 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=120 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=125 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", round((100*('row 4' - 'row 1') / 'row 1'), 2),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", round((100*('row 6' - 'row 1') / 'row 1'), 2),"N/A")
| eval Alert_Type = case (Percent_Increase_5_Mins>5,"Error",
                          Percent_Increase_3_Mins>5,"Warning")
| where isnotnull(Alert_Type)
Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.

View solution in original post

0 Karma

jacobpevans
Motivator

Greetings @mlui_2,

Please take a look at these run-anywhere searches. This sounds perfect for the transpose (https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transpose) command. Yours will still use count instead of sum(count)

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=102 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=103 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=104 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=106 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", (100*('row 4' - 'row 1') / 'row 1'),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", (100*('row 6' - 'row 1') / 'row 1'),"N/A")

And here's what the full alert would look like:

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=110 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=115 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=120 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=125 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", round((100*('row 4' - 'row 1') / 'row 1'), 2),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", round((100*('row 6' - 'row 1') / 'row 1'), 2),"N/A")
| eval Alert_Type = case (Percent_Increase_5_Mins>5,"Error",
                          Percent_Increase_3_Mins>5,"Warning")
| where isnotnull(Alert_Type)
Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.
0 Karma

dmarling
Builder

is your 5% increase based on each subsequent minute so it's exponential growth you are alerting on or some other aggregate?

If this comment/answer was helpful, please up vote it. Thank you.

mlui_2
Explorer

base on the requirement i got, it is based on each subsequent minute.

but this could lead to false positive alert. I'm open to suggestion on how should the alert should be.

0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...