Splunk Search

count average for sorted 90% of the events (tricky)

amitsehgal
Path Finder

This is little tricky.....I already know there's a method available which can give me percentile...perc90... but what i would want is not discard the events by value of a percentile but number of events of a percentile....e.g. sorted values are : 1,2,3,4,5,6,6,6,7,8,9.Let's say x percentile of dataset is 5 (by value)... and if i discard greater than 5. it would discard all the three 6. but what i would want is sort and calculate the average X percentage of total which means may be 1,2,3,4,5,6,6. Is there any suggestions how to do this ? Already have search query as below which gives by value...need to find how i can fit the sort and than head x percentage by count ???

.. | eventstats perc90(response_time) as response_time_90p | where response_time < response_time_90p | stats avg(response_time)

UPDATE:

Thanks for the prompt response. Sorry if I'm not clear. Let's say if i have response times:

1) data= 1,1,3,3,4,4,4,5,6,6,6,9,9,9,9,7,7,7,7,8,8,8,8,2,2

2) Sort data :1,1,2,2,3,3,4,4,4,5,6,6,6,7,7,7,7,8,8,8,8,9,9,9,9

3) Discard Highest 10% of data and keep 90% by volume and not by value. I'm left with 1,1,2,2,3,3,4,4,4,5,6,6,6,7,7,7,7,8,8,8,8,9,9 (CORRECT)

4) take an average of the 90%

Did you see the difference if i use perc90(data) on above series I get '9' and If i will calculate average of series less than value 9 i would get 1,1,2,2,3,3,4,4,4,5,6,6,6,7,7,7,7,8,8,8,8 (WRONG) and would miss out on last two nines. Please compare the series WRONG and CORRECT. I hope i'm able to explain the issue.

Thanks, Amit

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

This isn't that hard to do, but it's very hard to parallelize, i.e., it's very ineffective to map-reduce, compared with averaging below a percentile. May I ask what the mathematical basis for preferring this computation to simply averaging everything below the 90th percentile would be? Especially since they would only have different results when considering artificial and pathological data sets, particularly if you measure your response times with sufficient granularity.

... | sort response_time | eventstats count as ttl | streamstats global=t current=t count as pos | where pos<(0.9*ttl) | stats avg(response_time)
0 Karma

amitsehgal
Path Finder

Thanks for the prompt response. Sorry if I'm not clear. Let's say if i have response times: 1) data= 1,1,3,3,4,4,4,5,6,6,6,9,9,9,9,7,7,7,7,8,8,8,8,2,2 2) Sort data :1,1,2,2,3,3,4,4,4,5,6,6,6,7,7,7,7,8,8,8,8,9,9,9,9 3) Discard Highest 10% of data and keep 90% by volume and not by value. I'm left with 1,1,2,2,3,3,4,4,4,5,6,6,6,7,7,7,7,8,8,8,8,9,9 (CORRECT) 4) take an average of the 90%

Did you see the difference if i use perc90(data) on above series I get '9' and If i will calculate average of series less than value 9 i would get 1,1,2,2,3,3,4,4,4,5,6,6,6,7,7,7,7,8,8,8,8 (WRONG) and would miss out on last two nines. Please compare the series WRONG and CORRECT. I hope i'm able to explain the issue.

Thanks, Amit

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

If I understand correctly, you calculate the arithmetic mean ("average") of all values that are below the (e.g.) 90th percentile? And it looks like you know how to do that. However, I can't really understand what you then want to do, i.e., what you mean by "sort and then head x percentage"?

0 Karma

sideview
SplunkTrust
SplunkTrust

I'm having difficulty understanding - in your example what does the "average X percentage of total which means may be 1,2,3,4,5,6,6" mean? Maybe add more examples?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...