Splunk Search

Compare standard deviation results for two sets of results

gauravg_cvent
Engager

I have a query that uses stdev on the field value "queue_length" by field "queue_name". I need a query that gives me results only if stdev_5m > 2*stdev_hour. But the issue is sometime the "queue_name" doesn't appear in the search for the previous five minutes but it does appear for the previous hour. That's why below Splunk query giving wrong result because it's not comparing same queue_name, it's compare column by column in-respect to which queue name it has in the column.

index=cvt_metrics sourcetype=report_service_broker_queue earliest=-1h| where queue_length > 0 | stats stdev(queue_length) AS stdev_hour by queue_name | appendcols [ search index=cvt_metrics sourcetype=service_broker_queue earliest=-5m| where queue_length > 0 | stats stdev(queue_length) AS stdev_5m by queue_name] | eval Result=if(stdev_5m > 2*stdev_hour, "Error", "OK") | search Result="Error"
Tags (2)
0 Karma
1 Solution

DalJeanis
Legend

The problem is that you are using appendcols. Don't do that. It is almost always the wrong method, unless you can be absolutely certain that each query will produce exactly the same results in the same order... and probably not even then.

In this case, you are also going back and getting the same records twice. Instead, you should just calculate the stdev of the 5m subset separately with the 60m records, using an eval. The addinfo gets you the end time of the data (info_max_time) and then you subtract 300 seconds.

 index=cvt_metrics sourcetype=report_service_broker_queue earliest=-1h
| where queue_length > 0 
| addinfo 
| eval time5m = info_max_time - 300
| stats stdev(queue_length) AS stdev_hour,  
    stdev(eval(case(_time>=time5m,queue_length))) as stdev_5m
    by queue_name 
| where stdev_5m>2*stdev_hour

View solution in original post

0 Karma

DalJeanis
Legend

The problem is that you are using appendcols. Don't do that. It is almost always the wrong method, unless you can be absolutely certain that each query will produce exactly the same results in the same order... and probably not even then.

In this case, you are also going back and getting the same records twice. Instead, you should just calculate the stdev of the 5m subset separately with the 60m records, using an eval. The addinfo gets you the end time of the data (info_max_time) and then you subtract 300 seconds.

 index=cvt_metrics sourcetype=report_service_broker_queue earliest=-1h
| where queue_length > 0 
| addinfo 
| eval time5m = info_max_time - 300
| stats stdev(queue_length) AS stdev_hour,  
    stdev(eval(case(_time>=time5m,queue_length))) as stdev_5m
    by queue_name 
| where stdev_5m>2*stdev_hour
0 Karma

gauravg_cvent
Engager

Thanks it works but I modified the query to exclude 5m data from last 1h to get more appropriate values.

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...