Splunk Search

eval with mathematical calculations?

tfitzgerald15
Explorer

I'm trying to do something a little wonky here, so please bear with me. The code below is the logical flow of what I'm trying to accomplish. However I know for a fact it won't work. Can you give me some insight?

... | eval threshold=(outlier action=rm param=10 (stdev(count)))

Basically I'm trying to create a field, "threshold", which calculates the standard deviation of my results, removing any outliers prior to calculating the standard deviation. (Basically, I don't want that one result of 7,000 skewing my standard deviation upwards when normally it would have been, say, 10).

Tags (4)
0 Karma

jhupka
Path Finder

Is this what you're looking for...here's a search that does what I think you are asking for on indexer lag (_indextime-_time). So if this does what you're looking for you'll just need to modify to fit your search/data:

index=_internal | eval indexer_lag =_indextime - _time 
| eventstats p25(indexer_lag) as q1, p75(indexer_lag) as q3 | eval iqr=q3-q1 | eval threshold=10*iqr 
| where indexer_lag < threshold 
| eventstats stdev(indexer_lag) as threshold_stddev

I'll explain each part:

index=_internal | eval indexer_lag =_indextime - _time 

^^^ Calc our index lag for each event

 | eventstats p25(indexer_lag) as q1, p75(indexer_lag) as q3 | eval iqr=q3-q1 | eval threshold=10*iqr 

^^^ Now use eventstats to get our q1, q3 in-line with our events, then calc our interquartile range, and our threshold based on your choosing of 10*iqr from your original post.

| where indexer_lag < threshold 

^^^ Only keep events that have lag less than our threshold. e.g. remove our 10*iqr outliers

| eventstats stdev(indexer_lag) as threshold_stddev

^^^ Finally use eventstats again to calculate the new standard deviation in-line based on our new list of events.

jhupka
Path Finder

So you don't need the indexer_lag field, per se, but your overall search will be similar. If you're looking at specific sourcetype over all indexes, then your search may start like this:

index=* sourcetype=tfitzgerald15s_type |

And the indexer_lag is just a new field I am calculating based on what I want to base my threshold on for the example. So in your case it might be a calculation you have to do for CPU usage, or HTTP response times, or transaction duration.

Also, if this answers your question don't forget to accept/up-vote the answer 🙂

0 Karma

tfitzgerald15
Explorer

Awesome, thanks! I do just have one question. I'm not pointing to a specific indexer, I'm looking at a specific sourcetype. Would I still need the indexer_lag, and what does that represent? Apologies for the admitted newbie question there.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...