Splunk Search

How to edit my search for proper visual statistical analysis of problem severity for a web service?

kaurinko
Communicator

Hi,

I am trying to analyze the problem severity of a web service by weighting the failure fraction of cases by the load i.e. the total number of hits during the analysis period. By this I would like to get rid of false positive alerts during night time when the load is considerably lower.

My baseline search is:

index=myidex | stats count(eval(STATUS_CODE=2)) AS OK count(eval(STATUS_CODE != 2)) AS NOK by _time | timechart span=60m avg(eval(NOK/(NOK+OK))) AS "Failure Fraction" avg(eval(OK/(NOK+OK))) AS "Success Fraction" avg(eval(max(2,NOK+OK))) AS Factor

This gives me two curves mirrored by the straight line at y=0.5, which is OK. What is NOT OK is the straight line at y=2.0 labelled as Factor. It is supposed to give me the number of hits during the timeinterval, but instead it gives me a constant 2.0 indicating that it is evaluated separately for each and every log file line.

The "Factor" here is a crude example showing that for some reason I fail to manipulate the statistics. I tried things like

avg(eval(NOK/max(100,(NOK+OK)))) AS "Failure Fraction"

simply to receive a lower number for times when the actual load is lower.

Could you help me out here, why can't I do the math where I want to do it? I guess it is something fundamentally simple I just can't see.

Any help is appreciated!

0 Karma
1 Solution

baerts
Path Finder

I tried:

index=_internal |bin _time span=5m |stats count(eval(log_level="ERROR")) AS NOK count(eval(log_level!="ERROR")) AS OK by _time|timechart span=5m avg(eval(NOK/(NOK+OK))) AS "Failure Fraction" avg(eval(OK/(NOK+OK))) AS "Success Fraction" avg(eval(max(2,NOK+OK))) AS Factor

and this appears to work (although the number factor completely blows the other variables out of the time chart) . I needed to bin the stats in the same time interval as the time chart (5 minutes)

View solution in original post

baerts
Path Finder

I tried:

index=_internal |bin _time span=5m |stats count(eval(log_level="ERROR")) AS NOK count(eval(log_level!="ERROR")) AS OK by _time|timechart span=5m avg(eval(NOK/(NOK+OK))) AS "Failure Fraction" avg(eval(OK/(NOK+OK))) AS "Success Fraction" avg(eval(max(2,NOK+OK))) AS Factor

and this appears to work (although the number factor completely blows the other variables out of the time chart) . I needed to bin the stats in the same time interval as the time chart (5 minutes)

kaurinko
Communicator

Hi!

Thanks a lot for your help. Clearly bin did the trick.

0 Karma

baerts
Path Finder

Btw when i use:

 index=_internal |timechart count(eval(log_level="ERROR")) AS NOK count(eval(log_level!="ERROR")) AS OK|eval Nodata=if(NOK+OK=0,1,0) 

and choose stacked area representation, I get a nice 100% stacked graph that has an additional "Nodata" variable when I receive no hits on OK or NOK

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...