Hi,
I have a query like this:
(splunk_server="serverA" OR splunk_server="serverB") (app="Cargo" OR app="Customer") index="dev-cargo-app" env="DEV" site=* (sourcetype=app:Cargo:Performance) ms | stats avg(duration) as avgdur, perc95(duration) as 95perc
I am trying to make it more efficient using summary indexing:
Summary index:
(splunk_server="serverA" OR splunk_server="serverB") (app="Cargo" OR app="Customer") index="dev-cargo-app" env="DEV" site=* (sourcetype=app:Cargo:Performance OR sourcetype=app:Customer:Performance) ms
| stats count(_raw) as "No. of Events",values(duration) as "Duration" by app, site, sourcetype, category, _time
My 'efficient' query is:
index= summary report="summary_index_name" | search sourcetype=app:Cargo:Performance
| stats avg(Duration) as avgdur, perc95(Duration) as 95perc
Average duration is calculated correctly, but perc95(Duration) does not match.
My 'efficient query' gives me 100 > normal query for perc95(Duration).
Is it not possible to calculate perc95(Duration) using summary index?
Thanks,
Deepak
1) The biggest problem is "values(duration)
". Values
eliminates duplicates, so the average and percentiles will never be calculated correctly.
2) Use sistats
, not stats
, to populate a summary index. It will keep the "shape" of the underlying data.
See this page for more tips, and a link to a video - http://docs.splunk.com/Documentation/SplunkCloud/6.6.0/Knowledge/Usesummaryindexing