Splunk Search

How to search the percentage increase and display top 10 error codes with the highest increase in the last 24 hours?

demkic
Explorer

Hello,

I am stuck on my search and was hoping I could get some help.

I am trying to calculate the % increase and display the top 10 error codes with the highest % increase in the last 24 hours compared to the daily average number of errors last week. Here is my search:

base search ... earliest=-7d@d latest=@d
| bucket _time span=1d 
| stats count(errorType) as errorType by _time
| eventstats avg(errorType) as LastWeekDailyAverage 
| eval LastWeekDailyAverage=round(LastWeekDailyAverage, 0) | append 
    [ base search ... earliest=-1d@d latest=@d
    | stats count(errorType) as yesterday_errorType] 
| eval percent_increase=((errorType-LastWeekDailyAverage)/LastWeekDailyAverage)*100

Calculation should look like: [(today's count - dailyAverage last week)/dailyAverage last week)]*100.
I am unable to figure out how to calculate the daily average last week and then use it in my eval function to calculate the % increase and display only for the last 24 hours by error type.

I envision the output to be 10 bar graphs each representing an error typeI for the last 24 hours, and sorted by error type with the highest % increase to the lowest % increase.

I hope this makes sense,
Thank you

0 Karma
1 Solution

gokadroid
Motivator

How about you try this and see if it works the way you wanted:

base search ... earliest=-7d@d 
| eval myTime=case( _time < relative_time(now(), "-24h"), "Last_Week", _time >= relative_time(now(), "-24h"), "Today", 1=1, "Other")
| chart count over errorType by myTime 
| eval percChange=round( ('Today'-('Last_Week'/7))*100/('Last_Week'/7) , 2)
| chart max(percChange) over errorType
| sort 10 percChange

View solution in original post

sundareshr
Legend

See if this gives you what you're looking for

base search ... earliest=-7d@d latest=@d
| eval when=if(_time<relative_time(now(), "@d"), "LastWeek", "Today")
| bin span=1d _time
| eventstats count by _time errorType
| chart avg(count) as avg over errorType by when
| eval percent_increase=(Today-LastWeek)/LastWeek*100
| sort 10 percent_increase
0 Karma

demkic
Explorer

I think it's really close - but just so that I can double check do you mind letting me know how can I quickly calculate the dailyaverage number of errors for a specific error (I will specify in the base search that errorType=XXX over the last week and find the number of errors today (precisely over the last 24 hours from when I run the search).

0 Karma

sundareshr
Legend

How about this

... errorType=XXX earliest=-2d@d | timechart span=1d count | stats avg(count) as daily_avg
0 Karma

demkic
Explorer

Thank you, but is it possible to see the daily_avg (over a period of 7 days) per errorType?

base search...errorType=* earliest=-7d@d | timechart span=1d count | stats avg(count) as daily_avg by errorType 

The above query does not seem to work, many thanks for your time.

0 Karma

demkic
Explorer

I should also mention from my original question from up above when I said "Calculation should look like: [(today's count - dailyAverage last week)/dailyAverage last week)]*100" by today's count I actually mean the last 24 hours ago.

0 Karma

gokadroid
Motivator

How about you try this and see if it works the way you wanted:

base search ... earliest=-7d@d 
| eval myTime=case( _time < relative_time(now(), "-24h"), "Last_Week", _time >= relative_time(now(), "-24h"), "Today", 1=1, "Other")
| chart count over errorType by myTime 
| eval percChange=round( ('Today'-('Last_Week'/7))*100/('Last_Week'/7) , 2)
| chart max(percChange) over errorType
| sort 10 percChange

gokadroid
Motivator

Hi @demkic,

Straight from the docs:

relative_time(X,Y)  This function takes an epochtime time, X, as the first argument and a relative time specifier, Y, as the second argument and returns the epochtime value of Y applied to X.  ... | eval n=relative_time(now(), "-1d@d")  

So the below statement:

| eval myTime=case( _time < relative_time(now(), "-24h"), "Last_Week",
 _time >= relative_time(now(), "-24h"), "Today", 
1=1, "Other") 

Means

| eval myTime=case( if timestamp on the event is less than timestamp which gets generated by subtracting 24 hours from the time when this query is running, Mark this even with a value "Last Week",
if timestamp on the event is greater than timestamp which gets generated by subtracting 24 hours from the time when this query is running, Mark this even with a value "Today",
If anything else , mark that as "Other"

So after this command, every event will have a field called myTime which will have value "Last Week" if older than 24 hours than now, "Today" if within last 24 hours and "Other" for default case.

Since you specifically wanted last 24 hours which might not be the case "Today" but you can change that term to whatever name like "Last 24 hours"

The second command |chart XXX over YYY by Ccc means keep YYY on x axis when plotting XXX and create the divisions by Ccc

0 Karma

demkic
Explorer

Thank you very much for the explanation. I think for the purpose of this analysis it's actually best that I display the results in a table only. However, when I take the command | chart count over errorType by myTime our of the query, I get empty results. With this command, even if I save the report as a statistics table only, it still displays the chart as well.

0 Karma

demkic
Explorer

Hi there, when running your query I am receiving this error: "Error in 'chart' command: The specifier 'percChange' is invalid. It must be in form (). For example: max(size)."

0 Karma

gokadroid
Motivator

try max(size) as that will help.

0 Karma

demkic
Explorer

Thank you for your reply. I actually got it to work by just taking out the | chart max(percChange) over errorType line. When I used max(percChange) it changed the calculation.

Here is my final query:

base search... earliest=-7d@d 
| eval myTime=case( _time < relative_time(now(), "-24h"), "Last_Week", _time >= relative_time(now(), "-24h"), "Today", 1=1, "Other") 
| chart count over errorType by myTime 
| eval percChange=round( ('Today'-('Last_Week'/7))*100/('Last_Week'/7) , 2), daily_average_lastweek=round('Last_Week'/7, 0) | table errorType daily_average_lastweek Today percChange 
| sort -percChange | head 10

Could I please ask you to explain to me these two lines of code:
1) | eval myTime=case( _time < relative_time(now(), "-24h"), "Last_Week", _time >= relative_time(now(), "-24h"), "Today", 1=1, "Other")
I am a bit unsure of what relative_time means and in combination with other commands as it is written above.
2) | chart count over errorType by myTime
Here I am unsure of what "over" errorType is doing

Also, when/why should I use single quotes in eval commands?

Thank you!

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...