Solved: Streamstats with time window

gschmitz · ‎01-15-2013

Hi,
a question about streamstats as described here:
http://docs.splunk.com/Documentation/Splunk/5.0.1/SearchReference/Streamstats
It works out like described, but my query requires me to look at events based on a time range instead of a fixed number. Do you now of a way to manage that?
In other words, I want an accumulated count of all events 24 hours before the one I'm looking right now. The indexing volume might be a good example.

If window could accept the syntax from earliest or latest, that would be awesome and look like:
streamstats avg(foo) window=-24h

EDIT: Maybe join can help, but I couldn't make _time of the parent query be an input to earliest and latest of the subsearch.

martin_mueller · ‎01-16-2013

This is not strictly what you describe, but may do as a workaround.

Consider two steps. First, you count or sum using a timechart (or bin and stats, if you prefer). Second, you use streamstats with an integer window since you now know the number per 24 hours.

In your example you mentioned avg(foo), in such a case you need to think about the loss of information when doing two steps of averages. For example, if you bin by minutes and have ten events in one minute but one in the other then the single event will be weighted much more than the ten events in the final average. One solution would be to keep a sum and a count, and at the very end compute the average yourself.

View solution in original post

martin_mueller · ‎01-16-2013

This is not strictly what you describe, but may do as a workaround.

Consider two steps. First, you count or sum using a timechart (or bin and stats, if you prefer). Second, you use streamstats with an integer window since you now know the number per 24 hours.

In your example you mentioned avg(foo), in such a case you need to think about the loss of information when doing two steps of averages. For example, if you bin by minutes and have ten events in one minute but one in the other then the single event will be weighted much more than the ten events in the final average. One solution would be to keep a sum and a count, and at the very end compute the average yourself.

martin_mueller · ‎01-28-2013

Indeed, the point of a timechart is to have continuous values without missing buckets.

gschmitz · ‎01-27-2013

Sorry for not seeing this earlier. I expected Splunk to send an email for any replies.
How would this work for missing buckets? Window only means the number of events and timechart doesn't render empty buckets, doesn't it?
EDIT: Nevermind, timechart does return zero here, so this is a solution you can use down to 1s resolution!
Thx!

martin_mueller · ‎01-16-2013

How about this?

...  | timechart span=1m count, sum(field) as sum_field | streamstats window=1440 sum(count) as total_count, sum(sum_field) as total_sum

You'll get the total_count and total_sum for the previous 24 hours before any minute you like. Using that you can eval the floating 24h average.

gschmitz · ‎01-16-2013

Thank you for the idea. While it works as an approximation very well, I'm still wondering how would you proceed about the kepping a count and sum.
|bucket kb, count(*) span=1d did not work out for me at least.

Streamstats with time window

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!