Splunk Search

How can I check if the number of events in the last hour is greater than the second standard deviation of hourly occurrences over the last week?

crisjnelson
Explorer

I am trying to determine if the number of Full GC events in the last hour is greater than the 2nd standard deviation of hourly Full GC events in the last 7 days. I am trying to use a subsearch, but can't figure out how that should work. What am I doing wrong here?

sourcetype=aps_gc earliest=-1h | stats count as new_gc_count | where new_gc_count>std_dev*2 [ search sourcetype=aps_gc earliest=-168h latest=-1h | timechart span=1h count as old_gc_count | stats stdev(old_gc_count) as std_dev ]

0 Karma

jplumsdaine22
Influencer

Also - are you looking for if the new_gc_count field is greater than the std_dev*2, or where new_gc_count is greater than the mean + 2 standard deviations.

There's a significant difference!

0 Karma

DalJeanis
Legend

The following code will work fine, but really, these searches should probably be using tstats instead of stats, because the count per hour has to be in the index. I'll leave that as an exercise for the new splunkers in the community, and upvote the first correct version Any takers?


@crisjnelson -

Don't use subsearches if you can avoid it. In this case, avoiding it is relatively easy....

 search sourcetype=aps_gc earliest=-168h@h latest=@h

| rename COMMENT as "First, calculate the hourly counts for the entire period in question"
| bin _time span=1h
| stats count as hourcount by _time 

| rename COMMENT as "Next, find the current time, and avg and stddev count"
| eventstats max(_time) as maxTime, avg(hourcount) as houravg, stdev(hourcount) as hourstdev

| rename COMMENT as "Now, get only the current hour"
| where _time==maxTime

| rename COMMENT as "and check if it is greater than 2 sds above average"
| where hourcount>houravg+2*hourstdev

Notes - The above code includes the current hour's count in the calculation of the avg and stdev. Excluding it would be slightly more resource-intensive, but not hard...

 search sourcetype=aps_gc earliest=-168h@h latest=@h

| rename COMMENT as "First, calculate the hourly count for the period in question"
| bin _time span=1h
| stats count as hourcount by _time

| rename COMMENT as "Find the current time; create hourcount2 which does not include the current hour"
| eventstats max(_time) as maxTime 
| eval  hourcount2=if(_time<maxTime,hourcount,null()) 

| rename COMMENT as "Next, find the avg and stddev count"
| eventstats avg(hourcount2) as houravg stdev(hourcount2) as hourstdev

| rename COMMENT as "Now, get only the current hour"
| where _time==maxTime

| rename COMMENT as "and check if it is greater than 2 sds above average"
| where hourcount>houravg+2*hourstdev
0 Karma

woodcock
Esteemed Legend
0 Karma

jplumsdaine22
Influencer

The return command is probably what you want:

sourcetype=aps_gc earliest=-1h 
| stats count as new_gc_count 
| where new_gc_count> 
    [ search sourcetype=aps_gc earliest=-168h latest=-1h 
    | timechart span=1h count as old_gc_count 
    | stats stdev(old_gc_count) as std_dev 
    | eval std_dev=std_dev*2
    | return $stdev]
0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...