Splunk Search

Subsearch vs. stats = different results. Why?

lguinn2
Legend

Here are two searches, which I think are logically equivalent, yet they return different results in Splunk.

Option 1: with a subsearch

index=web sourcetype=access_combined status<400
   [ search index=web sourcetype=access_combined status>=400
    | dedup clientip  | fields clientip ] 
| stats sum(bytes) as bytes by clientip 
| stats avg(bytes) as avg_bytes, median(bytes) as median_bytes

Option 2: using an eval to replace the subsearch

index=web sourcetype=access_combined
| eval check=if(status>=400,"Bad","Okay")
| chart sum(bytes) as bytes by clientip check
| where Bad > 0
| stats avg(Okay) as avg_bytes, median(Okay) as median_bytes

The concept of both searches is the same: Identify IPs that have had HTTP errors in the previous week, and summarize the number of bytes of "successful" traffic, average and median during that timeframe. (My real search is slightly different, but this illustrates the problem perfectly.) Successful traffic is defined as status<400, and HTTP errors are status >=400. I am using the standard access_combined sourcetype for this example, so clientip is the IP address that is connecting to the Apache server, status is the HTTP status code, and bytes is the number of bytes in the HTTP request.

But the searches give slightly different results. For example, the second search gave an average of 251923.11538461538 while the first search gave an average of 42823.32638888889.

I am sure this is something simple that I have overlooked, but I don't see it! I've even looked at the Search Job Inspector, but nothing shows up there either. The subsearch is not hitting any limits on execution time or number of results; the overall data set is fairly small.

What have I missed?

[Update: fixed a couple of typos, which of course I didn't see until I posted... and I had messed up the second search in a big way!]

0 Karma

lguinn2
Legend

This worked for my problem:

Option 1: with a subsearch (same as before)

 index=web sourcetype=access_combined status<400
    [ search index=web sourcetype=access_combined status>=400
     | dedup clientip  | fields clientip ] 
 | stats sum(bytes) as bytes by clientip 
 | stats avg(bytes) as avg_bytes, median(bytes) as median_bytes

Option 2: using an eval to replace the subsearch

index=web sourcetype=access_combined
| eventstats count(eval(status>=400)) as Bad by clientip
| where Bad > 0 AND status < 400
| chart sum(bytes) as bytes by clientip
| stats avg(bytes) as avg_bytes, median(bytes) as median_bytes

Thank you to both @DalJeanis and @sideview - you got me to rethink the problem and my solution...

DalJeanis
SplunkTrust
SplunkTrust

I'd do something like this

 index=web sourcetype=access_combined
 | eval goodbytes=case(status<400,bytes)
 | stats max(eval(case(status>=400,"Y") ) ) as BadFound,
        avg(goodbytes) as avg_bytes, median(goodbytes) as med_bytes   
         by clientip          
 | where isnotnull(BadFound)

The results should be the average and median of the number of good bytes, for only those clientips that had errors.

sideview
SplunkTrust
SplunkTrust

I think the problem is that in the first one, the one with two stats commands, your second stats command is average(bytes) and median(bytes), but the values of bytes being fed into it, are not the small per-event bytes numbers, they are instead the much larger "sum(bytes) per clientip" numbers generated by the first stats.

In other words the first search is doing something kind of unexpected statistically -- "get me the total bytes for each of the N clientips, and then give me the average of those N totals".

Which isn't what you meant, and happens to yield answers that are quite a lot bigger.

Be very careful of average.... you can chain stats together and still do averages and medians and stuff if you're paranoid. the problem is that the "first" one has to preserve both sums and counts.... In this case technically I think you could rewrite the first search's first stats command, to also pass along the counts per clientip, total up all those counts, and then device the final bytes by that factor but..... even as an "exercise for the reader" that's a bit insane.

.

lguinn2
Legend

I think your comment is good - but don't both searches summarize by clientip and then average?

0 Karma

sideview
SplunkTrust
SplunkTrust

YIKES - the question got edited so as to pretty fundamentally change the searches, so

a) my answer doesn't make any sense anymore.

b) The two searches after the edits, return identical results. Tested it pretty extensively and I can find no differences.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...