Splunk Search

How to edit my search for proper visual statistical analysis of problem severity for a web service?

kaurinko
Communicator

Hi,

I am trying to analyze the problem severity of a web service by weighting the failure fraction of cases by the load i.e. the total number of hits during the analysis period. By this I would like to get rid of false positive alerts during night time when the load is considerably lower.

My baseline search is:

index=myidex | stats count(eval(STATUS_CODE=2)) AS OK count(eval(STATUS_CODE != 2)) AS NOK by _time | timechart span=60m avg(eval(NOK/(NOK+OK))) AS "Failure Fraction" avg(eval(OK/(NOK+OK))) AS "Success Fraction" avg(eval(max(2,NOK+OK))) AS Factor

This gives me two curves mirrored by the straight line at y=0.5, which is OK. What is NOT OK is the straight line at y=2.0 labelled as Factor. It is supposed to give me the number of hits during the timeinterval, but instead it gives me a constant 2.0 indicating that it is evaluated separately for each and every log file line.

The "Factor" here is a crude example showing that for some reason I fail to manipulate the statistics. I tried things like

avg(eval(NOK/max(100,(NOK+OK)))) AS "Failure Fraction"

simply to receive a lower number for times when the actual load is lower.

Could you help me out here, why can't I do the math where I want to do it? I guess it is something fundamentally simple I just can't see.

Any help is appreciated!

0 Karma
1 Solution

baerts
Path Finder

I tried:

index=_internal |bin _time span=5m |stats count(eval(log_level="ERROR")) AS NOK count(eval(log_level!="ERROR")) AS OK by _time|timechart span=5m avg(eval(NOK/(NOK+OK))) AS "Failure Fraction" avg(eval(OK/(NOK+OK))) AS "Success Fraction" avg(eval(max(2,NOK+OK))) AS Factor

and this appears to work (although the number factor completely blows the other variables out of the time chart) . I needed to bin the stats in the same time interval as the time chart (5 minutes)

View solution in original post

baerts
Path Finder

I tried:

index=_internal |bin _time span=5m |stats count(eval(log_level="ERROR")) AS NOK count(eval(log_level!="ERROR")) AS OK by _time|timechart span=5m avg(eval(NOK/(NOK+OK))) AS "Failure Fraction" avg(eval(OK/(NOK+OK))) AS "Success Fraction" avg(eval(max(2,NOK+OK))) AS Factor

and this appears to work (although the number factor completely blows the other variables out of the time chart) . I needed to bin the stats in the same time interval as the time chart (5 minutes)

kaurinko
Communicator

Hi!

Thanks a lot for your help. Clearly bin did the trick.

0 Karma

baerts
Path Finder

Btw when i use:

 index=_internal |timechart count(eval(log_level="ERROR")) AS NOK count(eval(log_level!="ERROR")) AS OK|eval Nodata=if(NOK+OK=0,1,0) 

and choose stacked area representation, I get a nice 100% stacked graph that has an additional "Nodata" variable when I receive no hits on OK or NOK

Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...