I have a set of web page performance measurements spanning quite some time, generated by an external monitoring provider. I want to be able to find the mean page performance after removing spikes caused by external factors out of our control, and am thinking along the lines of using a truncated mean as a best measure of central tendency but am having problems with the implementation.
Here's my thinking so far:
I can calculate how many values I should be removing easily, but can't work out how to actually remove them. If there's a better way, I'd love to know it!
My query string (not yet working properly) so far is:
startdaysago=7 monitorid=<foo> | eventstats count(rendertime) as nresults | eval nkeep=nresults-ceil(nresults*0.05) | sort 0 -rendertime | head nkeep
but of course head can't take a parameter that's not an integer.
Have you considered using outlier
to get rid of the edge cases?
http://www.splunk.com/base/Documentation/4.1.5/SearchReference/Outlier
Alternately, how about this:
startdaysago=7 monitorid=<foo>
| eventstats count(rendertime) as nresults
| eval low_clipping=(nresults*0.025)
| eval high_clipping=nresults-low_clipping
| sort rendertime
| streamstats count as sequence_number
| where sequence_number>low_clipping AND sequence_number<high_clipping
Have you considered using outlier
to get rid of the edge cases?
http://www.splunk.com/base/Documentation/4.1.5/SearchReference/Outlier
Alternately, how about this:
startdaysago=7 monitorid=<foo>
| eventstats count(rendertime) as nresults
| eval low_clipping=(nresults*0.025)
| eval high_clipping=nresults-low_clipping
| sort rendertime
| streamstats count as sequence_number
| where sequence_number>low_clipping AND sequence_number<high_clipping
Awesome! I hadn't managed to find any reasonable examples of 'where', but that's exactly what I need. Thanks!