Alerting

How to set up alert when error count of latest week is greater than average of all weeks in past 30 days?

allladin101
Explorer

Hi All,

I want to check if there is a way by which, I could set up an alert when the error count of the latest week is greater than the mean of all the weeks in the past 30 days. My current query is:

index=tms_uat* ERR  earliest=-30d@d latest=-0d@d tms_logcat="ERR-*" NOT ("SSL Error: Error on Read errno 104*")| timechart span=7d count by tms_logcat limit=40

Can someone please help me.

Tags (3)
0 Karma
1 Solution

strive
Influencer

With out timewrap app, if you need an answer

try this

<Some Search Terms...> earliest=-4w@w latest=-0w@w | bucket _time span=1w | stats count as TotalErrors by _time | eventstats mean(TotalErrors) as Mean | sort 1 -_time | eval alertCode = if(TotalErrors>Mean,1,0)

This is strictly based on last four weeks (time is snapped to week). It wont consider data for current week.

In your case if last week means 'last 7 days ignoring today' then change the earliest to -28d@d, latest as -0d@d, span as 7d.

As my earlier comment, it is not right to run searches on raw data if log volume is very high.

View solution in original post

strive
Influencer

With out timewrap app, if you need an answer

try this

<Some Search Terms...> earliest=-4w@w latest=-0w@w | bucket _time span=1w | stats count as TotalErrors by _time | eventstats mean(TotalErrors) as Mean | sort 1 -_time | eval alertCode = if(TotalErrors>Mean,1,0)

This is strictly based on last four weeks (time is snapped to week). It wont consider data for current week.

In your case if last week means 'last 7 days ignoring today' then change the earliest to -28d@d, latest as -0d@d, span as 7d.

As my earlier comment, it is not right to run searches on raw data if log volume is very high.

MuS
SplunkTrust
SplunkTrust

Hi alladin101,

this is another good use case for the timewrap app. Take this run everywhere command and adapt it to your needs:

index=_internal source=*metrics.log earliest=-30d@d 
| timechart span=1w count 
| timewrap w series=short 
| eval mean=(s1+s2+s3)/3 
| where s0 > mean

The timechart will count events for each week, timewrap will group each week into new fields called s0, s1 ...., the eval will calculate the mean of the last three weeks and the where will check if the lastet week event count is higher than the mean.

But remember, depending on the event count this can take some time to complete.

hope this helps to get you started ...

cheers, MuS

0 Karma

allladin101
Explorer

Volume is not even distributed, but we may says its mostly high.

Not using any summary index yet.

0 Karma

strive
Influencer

what is your log volume? If your log volume is high, then it is not right to execute the search on last weeks raw data.

Are you summarizing data and storing it in some summary index?

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...