Splunk Search

Historic average of last 30 days

JeusTheHun
New Member

I have a type of event that happens about 20 times a day. Each event carry a numeric value. Meaning is found in the sum of this value for each day.

To be able to compare the current day sum, I want an average of this sum, for the past 30 days. But not only that, I want a historic of this average sum for each span of past 30 days from now.

We are the 10/5, I want to see the average of the daily sum for 10/05 -> 09/05, 09/05 -> 08/05, 08/05 -> 07/05 , etc for the past 6 months.

So far I've done this :

sourcetype=BatchLog name=OperationInsertionBatchJob entitiesProcessed > 0 | bucket _time span=day | stats sum(entitiesProcessed) AS totalEntitiesProcessed BY _time | eval kTotalEntitiesProcessed = totalEntitiesProcessed / 1000 | timechart avg(kTotalEntitiesProcessed) span=30d

Back to reality, I only have data since a few days ago, so the result should only show me one metric : the last 30 days.
But unfortunately, as we are the 10/05, it shows me a value for 2017-09-29 and for 2017-08-30.

The time interval I selected is from now to last 6 months, no snap-to/now

Tags (1)
0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Try this ...

sourcetype=BatchLog name=OperationInsertionBatchJob entitiesProcessed > 0 
| bucket _time span=day 
| stats sum(entitiesProcessed) AS TEP BY _time 
| eval KTEP = TEP / 1000 
| streamstats count as DaysInPeriod avg(KTEP) as avgKTEP window=30

That gives you the 30 day average on every day. If you want to cut it down to only one record per 30 day increment, then you can do something like this:

| eventstats max(_time) as maxtime
| where (  (maxtime-_time) % (86400*30) = 0)
0 Karma

JeusTheHun
New Member

I've added | timechart first(avgKTEP) to be able to draw a chart of it.

To be able to chart the speed of each event ( entitiesProcess / duration ), I've written down this :

sourcetype=BatchLog name=OperationInsertionBatchJob entitiesProcessed > 0 
 | bucket _time span=day 
 | eval timeStart = strptime(date, "%+")
 | eval timeEnd = strptime(endDate, "%+")
 | eval duration = timeEnd - timeStart
 | stats sum(entitiesProcessed) AS totalEntitiesProcessed sum(duration) AS totalDuration BY _time 
 | eval speed = totalEntitiesProcessed / totalDuration
 | streamstats count as DaysInPeriod avg(speed) as avgSpeed window=30
 | eventstats max(_time) as maxtime
 | where (  (maxtime-_time) % (86400*30) = 0) 
 | timechart first(avgSpeed)

The request shows no error but the average speed is wrong.
Some plain text : the average speed is day-based, not event-based. So the sum of entitiesProcess divided by totalDuration for one day gives the speed of the day. The average speed for the past 30 days is the average of the speed of each day for the past 30 days. Maybe that sounds redundant but I hope there is no more ambiguity.

I though I had written this right in the query but obviously I'm wrong somewhere, ain't I ?

0 Karma

somesoni2
SplunkTrust
SplunkTrust

The span on the timechart/bucket starts calculating from 01-01-1970 so it'll no necessarily snap to current day. If you always run this for last 6 month, try this

sourcetype=BatchLog name=OperationInsertionBatchJob entitiesProcessed > 0 
| bucket _time span=day | stats sum(entitiesProcessed) AS totalEntitiesProcessed BY _time 
| eval kTotalEntitiesProcessed = totalEntitiesProcessed / 1000 
| eval _time=case(_time>=relative_time(now(0,"@d"),_time, _time>=relative_time(now(0,"@d-1mon"),relative_time(now(0,"@d-1mon"),_time>=relative_time(now(0,"@d-2mon"),relative_time(now(0,"@d-2mon"),_time>=relative_time(now(0,"@d-3mon"),relative_time(now(0,"@d-3mon"),_time>=relative_time(now(0,"@d-4mon"),relative_time(now(0,"@d-4mon"),_time>=relative_time(now(0,"@d-5mon"),relative_time(now(0,"@d-5mon"))
| stats avg(kTotalEntitiesProcessed) by _time
0 Karma

JeusTheHun
New Member

Thanks for your answer. Can you explain the eval statement please ? As far as I understand you modify the _time value to shift it in a way that matches the need & the bucket behavior.

0 Karma

somesoni2
SplunkTrust
SplunkTrust

That is correct. If today is 10/3, you wanted a bucket for current day (10/03), one for previous month (from 09/03 to 10/02) and so on. So the eval statement is updating the _time value as
1) if _time is from today (greater than or equal to midnight today), use midnight time for all those events
2) if above is not true and if _time falls between last month from yesterday (from 09/03 to 10/02), assign midnight of 09/03 to all those events. With same _time for those, you can calculate average easily)
3) if above is not true and if _time falls between 2 month ago from yesterday to 1 month ago from yesterday (from 08/03 to 09/02), assign midnight of 08/03 to all those events.,,, and so on.

0 Karma

JeusTheHun
New Member

I've ended up wiriting this :

sourcetype=BatchLog name=OperationInsertionBatchJob entitiesProcessed > 0 
 | bucket _time span=day | stats sum(entitiesProcessed) AS totalEntitiesProcessed BY _time 
 | eval kTotalEntitiesProcessed = totalEntitiesProcessed / 1000 
 | eval _time = case(
    _time >= relative_time(now(),"@d"), _time,
    _time >= relative_time(now(),"@d-1mon"), relative_time(now(),"@d-1mon"),
    _time >= relative_time(now(),"@d-2mon"), relative_time(now(),"@d-2mon"),
    _time >= relative_time(now(),"@d-3mon"), relative_time(now(),"@d-3mon"),
    _time >= relative_time(now(),"@d-4mon"), relative_time(now(),"@d-4mon"),
    _time >= relative_time(now(),"@d-5mon"), relative_time(now(),"@d-5mon")
)
 | timechart avg(kTotalEntitiesProcessed)

And I have what I wanted. For a related chart, I would like the average speed for the events. I've written this but I think the status sum is messing up the query. Can you explain me why ?

sourcetype=BatchLog name=OperationInsertionBatchJob  entitiesProcessed > 0
| bucket _time span=day
| stats sum(entitiesProcessed) AS totalEntitiesProcessed sum(duration) AS totalDuration BY _time date
| eval timeStart = strptime(date, "%+")
| eval timeEnd = strptime(endDate, "%+")
| eval duration = timeEnd - timeStart
| eval _time = case(
    _time >= relative_time(now(),"@d"), _time,
    _time >= relative_time(now(),"@d-1mon"), relative_time(now(),"@d-1mon"),
    _time >= relative_time(now(),"@d-2mon"), relative_time(now(),"@d-2mon"),
    _time >= relative_time(now(),"@d-3mon"), relative_time(now(),"@d-3mon"),
    _time >= relative_time(now(),"@d-4mon"), relative_time(now(),"@d-4mon"),
    _time >= relative_time(now(),"@d-5mon"), relative_time(now(),"@d-5mon")
)
| eval speed = totalEntitiesProcessed / totalDuration | timechart avg(speed)

EDIT : after checking, the output average is wrong 😕

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Your order of stats is off. It should be done after your eval duration so that the field you calculated is available to it. So it should be done like this. ALso check if you require the field date in stats as _time is already bucketed for a day.

sourcetype=BatchLog name=OperationInsertionBatchJob  entitiesProcessed > 0
 | bucket _time span=day
 | eval timeStart = strptime(date, "%+")
 | eval timeEnd = strptime(endDate, "%+")
 | eval duration = timeEnd - timeStart
 | stats sum(entitiesProcessed) AS totalEntitiesProcessed sum(duration) AS totalDuration BY _time
 | eval _time = case(
     _time >= relative_time(now(),"@d"), _time,
     _time >= relative_time(now(),"@d-1mon"), relative_time(now(),"@d-1mon"),
     _time >= relative_time(now(),"@d-2mon"), relative_time(now(),"@d-2mon"),
     _time >= relative_time(now(),"@d-3mon"), relative_time(now(),"@d-3mon"),
     _time >= relative_time(now(),"@d-4mon"), relative_time(now(),"@d-4mon"),
     _time >= relative_time(now(),"@d-5mon"), relative_time(now(),"@d-5mon")
 )
 | eval speed = totalEntitiesProcessed / totalDuration | timechart avg(speed)
0 Karma

JeusTheHun
New Member

Your updated request only shows empty avg(speed) 😕

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...