Splunk Search

Compare standard deviation results for two sets of results

gauravg_cvent
Engager

I have a query that uses stdev on the field value "queue_length" by field "queue_name". I need a query that gives me results only if stdev_5m > 2*stdev_hour. But the issue is sometime the "queue_name" doesn't appear in the search for the previous five minutes but it does appear for the previous hour. That's why below Splunk query giving wrong result because it's not comparing same queue_name, it's compare column by column in-respect to which queue name it has in the column.

index=cvt_metrics sourcetype=report_service_broker_queue earliest=-1h| where queue_length > 0 | stats stdev(queue_length) AS stdev_hour by queue_name | appendcols [ search index=cvt_metrics sourcetype=service_broker_queue earliest=-5m| where queue_length > 0 | stats stdev(queue_length) AS stdev_5m by queue_name] | eval Result=if(stdev_5m > 2*stdev_hour, "Error", "OK") | search Result="Error"
Tags (2)
0 Karma
1 Solution

DalJeanis
Legend

The problem is that you are using appendcols. Don't do that. It is almost always the wrong method, unless you can be absolutely certain that each query will produce exactly the same results in the same order... and probably not even then.

In this case, you are also going back and getting the same records twice. Instead, you should just calculate the stdev of the 5m subset separately with the 60m records, using an eval. The addinfo gets you the end time of the data (info_max_time) and then you subtract 300 seconds.

 index=cvt_metrics sourcetype=report_service_broker_queue earliest=-1h
| where queue_length > 0 
| addinfo 
| eval time5m = info_max_time - 300
| stats stdev(queue_length) AS stdev_hour,  
    stdev(eval(case(_time>=time5m,queue_length))) as stdev_5m
    by queue_name 
| where stdev_5m>2*stdev_hour

View solution in original post

0 Karma

DalJeanis
Legend

The problem is that you are using appendcols. Don't do that. It is almost always the wrong method, unless you can be absolutely certain that each query will produce exactly the same results in the same order... and probably not even then.

In this case, you are also going back and getting the same records twice. Instead, you should just calculate the stdev of the 5m subset separately with the 60m records, using an eval. The addinfo gets you the end time of the data (info_max_time) and then you subtract 300 seconds.

 index=cvt_metrics sourcetype=report_service_broker_queue earliest=-1h
| where queue_length > 0 
| addinfo 
| eval time5m = info_max_time - 300
| stats stdev(queue_length) AS stdev_hour,  
    stdev(eval(case(_time>=time5m,queue_length))) as stdev_5m
    by queue_name 
| where stdev_5m>2*stdev_hour
0 Karma

gauravg_cvent
Engager

Thanks it works but I modified the query to exclude 5m data from last 1h to get more appropriate values.

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...