My search currently gives me some statistics regarding response times including total count, average, min, max and 99% percentile value (I'm assuming that this is ordered) based on one of the fields from the query.
I need to find the average query count for the first 99% of my data count per query name, so my logic was to take the minimum value from the 99% percentile value, and dividing that by 99% of the total count. But when I run this, I often get values for this not in the range between minimum and maximum for certain values, e.g. Average: 15533.5, Min: 9076, Max: 24737, *99% Average: 479.0*, which clearly wouldn't make sense even if the numbers are heavily skewed.
This was my initial search attempt:
index=* ...
| stats count AS "Query Count", avg(ResponseTime) AS "AvgRespTime", min(ResponseTime) AS "MinRespTime", max(ResponseTime) AS "MaxRespTime", p99(ResponseTime) AS "99PercRespTime" by queryName
| eval AvgRespTime=round(AvgRespTime,1), 99PercAvgRespTime=('99PercRespTime'-'MinRespTime')/(0.99'Query Count'), 99PercAvgRespTime=round('99PercAvgRespTime',1)
| sort - 99PercAvgRespTime
*EDIT: I've seen a flaw in my initial idea where I'm taking single values rather than sums, which are how you get averages (oops) - what I'd ideally want is to take the sum of the 99th percentile (and above) value(s) away from the total sum and divide that by the count of queries minus the count of queries above (and including) the 99th percentile. *
(1) Is it even possible to accurately get the average of the first 99% of the data? My logic was that by removing the last 1% of queries (so if it was 1000, the 10 queries with the highest response times), I can get this average but what should the query be? I considered taking 99% average, but I assume that would remove 0.5% from the lower end as well as the upper end, which is what I've been asked to avoid.
(2) Is there a way to show skewness of data based off statistics on Splunk? If possible, I'd want to show a graph or at least data showing this if possible.
... View more