Solved: Compare standard deviation results for two sets of...

gauravg_cvent · ‎11-27-2017

I have a query that uses stdev on the field value "queue_length" by field "queue_name". I need a query that gives me results only if stdev_5m > 2*stdev_hour. But the issue is sometime the "queue_name" doesn't appear in the search for the previous five minutes but it does appear for the previous hour. That's why below Splunk query giving wrong result because it's not comparing same queue_name, it's compare column by column in-respect to which queue name it has in the column.

index=cvt_metrics sourcetype=report_service_broker_queue earliest=-1h| where queue_length > 0 | stats stdev(queue_length) AS stdev_hour by queue_name | appendcols [ search index=cvt_metrics sourcetype=service_broker_queue earliest=-5m| where queue_length > 0 | stats stdev(queue_length) AS stdev_5m by queue_name] | eval Result=if(stdev_5m > 2*stdev_hour, "Error", "OK") | search Result="Error"

DalJeanis · ‎11-27-2017

The problem is that you are using appendcols. Don't do that. It is almost always the wrong method, unless you can be absolutely certain that each query will produce exactly the same results in the same order... and probably not even then.

In this case, you are also going back and getting the same records twice. Instead, you should just calculate the stdev of the 5m subset separately with the 60m records, using an eval. The addinfo gets you the end time of the data (info_max_time) and then you subtract 300 seconds.

 index=cvt_metrics sourcetype=report_service_broker_queue earliest=-1h
| where queue_length > 0 
| addinfo 
| eval time5m = info_max_time - 300
| stats stdev(queue_length) AS stdev_hour,  
    stdev(eval(case(_time>=time5m,queue_length))) as stdev_5m
    by queue_name 
| where stdev_5m>2*stdev_hour

View solution in original post

DalJeanis · ‎11-27-2017

The problem is that you are using appendcols. Don't do that. It is almost always the wrong method, unless you can be absolutely certain that each query will produce exactly the same results in the same order... and probably not even then.

In this case, you are also going back and getting the same records twice. Instead, you should just calculate the stdev of the 5m subset separately with the 60m records, using an eval. The addinfo gets you the end time of the data (info_max_time) and then you subtract 300 seconds.

 index=cvt_metrics sourcetype=report_service_broker_queue earliest=-1h
| where queue_length > 0 
| addinfo 
| eval time5m = info_max_time - 300
| stats stdev(queue_length) AS stdev_hour,  
    stdev(eval(case(_time>=time5m,queue_length))) as stdev_5m
    by queue_name 
| where stdev_5m>2*stdev_hour

gauravg_cvent · ‎11-28-2017

Thanks it works but I modified the query to exclude 5m data from last 1h to get more appropriate values.

Compare standard deviation results for two sets of results

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life