Comments and answers for "how do the stats p* commands work in summary rollups?"
https://answers.splunk.com/answers/386373/how-do-the-stats-p-commands-work-in-summary-rollup.html
The latest comments and answers for the question "how do the stats p* commands work in summary rollups?"Comment by cphair on cphair's answer
https://answers.splunk.com/comments/386384/view.html
Yeah, I kind of figured there wasn't any magic. We have the same problem with average, that we're calculating an average of averages, but I knew about that one. I don't think running any daily search over the raw data will fly, but I'll see if people are okay with the reduced accuracy of the p* values over the hourly data. Thanks.Tue, 29 Mar 2016 14:30:52 GMTcphairAnswer by jeffland
https://answers.splunk.com/answering/387087/view.html
As much as I like to think that splunk is magic, splunk has no such feature to re-calculate from raw data if the search doesn't run on raw data.
If you do an hourly summary with p95(value) in it, a daily summary on those hourly summaries is the 95th percentile of your 24 data points per day. Maybe you can do a p95(value) at the end of each day for that day? Because with min, max (and avg without wheights), the summaries should be okay to use in subsequent summaries. I'm not sure what it's called, but I think there is a mathematical word for this "stability" over multiple iterations... if anyone knows, feel free to comment.
Anyhow, min/max are okay but percentiles will differ. How much this deviates from the "actual" value is not easily predictable - take this example:
index=_internal earliest=-1h latest=now kbps="*"
| bucket span=1m _time as time_buckets_minutes | bucket span=10m _time as time_buckets_10m
| eventstats p70(kbps) as p70_minute by time_buckets_minutes | eventstats p70(kbps) as p70_10m by time_buckets_10m | eventstats p70(p70_minute) as p70_10m_from_p70_minute by time_buckets_10m
| table _time time_buckets_* bytes p70_* | fieldformat time_buckets_minutes=strftime(time_buckets_minutes, "%F %T") | fieldformat time_buckets_10m=strftime(time_buckets_10m, "%F %T")
It shows the raw sample data (_time, kbps), two bucket time spans to simulate your summary ranges (one is each minute, one is ten minutes) and then the 70th percentile of kbps per minute (p70_minute), per 10 minutes (p70_10m) and one per 10 minutes which is based on the p70_minute (p70_10m_from_p70_minute) (I chose he 70th percentile for the sake of easier comprehension). If you compare p70_10m with p70_10m_from_p70_minute, you'll see the differences. If you append
| eval diff=abs(p70_10m-p70_10m_from_p70_minute) | stats avg(kbps) p10(diff) p90(diff) avg(diff)
to the above search, you get some statistics about the differences - but this may be deceiving, because this will vary depending on the raw data.
I'd suggest doing a summary over the whole day's raw data after each day to calculate that day's percentiles.Tue, 29 Mar 2016 14:27:31 GMTjeffland