Re: ratio calculation returns wrong values

sarit_s · ‎10-05-2023

Hello
I'm trying to calculate ratio of two fields but im getting wrong results
if i'm calculating each one of them separately im getting right results but together something is wrong

index=clientlogs sourcetype=clientlogs Categories="*networkLog*" "Request.url"="*v3/auth*"  Request.url!=*twofactor*  "Request.actionUrl"!="*dev*" AND "Request.actionUrl"!="*staging*"
| eval UserAgent = case(match(UserAgent, ".*ios.*"), "iOS FE",match(UserAgent, ".*android.*"), "Android FE",1=1, "Web FE")
| dedup UserAgent, _time
| stats count as AttemptsFE by UserAgent _time
| appendcols
    [search index=clientlogs sourcetype=clientlogs Categories="*networkLog*" "Request.url"="*v3/auth*"  Request.url!=*twofactor*  "Request.actionUrl"!="*dev*" AND "Request.actionUrl"!="*staging*"
    "Request.status" IN (201, 207) NOT "Request.data.twoFactor.otp.expiresInMs"="*"
    | eval UserAgent = case(match(UserAgent, ".*ios.*"), "iOS FE",match(UserAgent, ".*android.*"), "Android FE",1=1, "Web FE")
    | dedup UserAgent, _time
    | streamstats count as SuccessFE by UserAgent _time]
    | eval SuccessRatioFE = round((SuccessFE/AttemptsFE)*100, 2)
    | eval SuccessRatioFE = (SuccessFE/AttemptsFE)*100

| timechart bins=100  avg(SuccessRatioFE) as SuccessRatioFE BY UserAgent

PickleRick · ‎10-05-2023

OK. Some housekeeping stuff first:

1) Don't Use Wildcards At Beginning Of Your Search Term! Never! (or at least until you fully understand why you shouldn't do that). It will make Splunk have to read every single event from the given timerange, which will make it sloooooow.

2) Inclusion is always better than inclusion, so Request.url!=*twofactor* not only have that dreaded wildcard at the beginning, but also you are doing exclusion which again needs to parse every single event (as if p.1 didn't force Splunk to do it anyway)

3) Both of your searches have

search index=clientlogs sourcetype=clientlogs Categories="*networkLog*" "Request.url"="*v3/auth*"  Request.url!=*twofactor*  "Request.actionUrl"!="*dev*" AND "Request.actionUrl"!="*staging*"

As the "base search". There's no point in running this heavy (see p.1) search twice. Just run the search once and mark some events if needed.

4) I'm always very cautious about the dedup command. I find it very unintuitive and producing "strange" results - I prefer stats values() or similar stuff.

5) dedup by _time - what are you trying to achieve here? Especially that you don't use any binning?

Also, as @ITWhisperer already mentioned - appendcols just pastes a set of fields along given fields without any correlation between them. (not to mention subsearch limitations which might skew results even further).

So - what's the business case? Because that's surely better done otherwise - most probably with one search (even if we leave for now the thing about wildcards and exclusions).

sarit_s · ‎10-05-2023

Hello
Thanks for you detailed answer
What im trying to achieve is to calculate the ration between the attempts and the success
Actually I've searched some details regarding the base search and how to use it in other parts of the query (also i want to use base search in other panel at the same dashboard) and i couldn't find any information

I will really appreciate if you will be able to correct my query as im not familiar with different way to achieve the same.

P.S - at the end, the final query should append the results from this query and results from similar query but for the BE part
so the final query will be larger and heavier

PickleRick · ‎10-05-2023

No, it's not about "base search" functionality in dashboards. I'm not talking about that. I'm talking about the base search conceptually.

But conceptually I'd do something like that:

<your base search>
| eval success=if(your_criteria_for_success,1,0)
| stats count as total sum(success) as susccessed by UserAgent
| eval ratio=successed/total

sarit_s · ‎10-05-2023

ok, got it, thanks

but how can i change it to timechart ?
because when im running this :

index=clientlogs sourcetype=clientlogs Categories="*networkLog*" "Request.url"="*v3/auth*"  Request.url!=*twofactor*  "Request.actionUrl"!="*dev*" AND "Request.actionUrl"!="*staging*"
| eval SuccessFE=if(('Request.status' IN (201, 207) ),1,0)

| eval UserAgent = case(match(UserAgent, ".*ios.*"), "iOS FE",match(UserAgent, ".*android.*"), "Android FE",1=1, "Web FE")
 | stats count as total sum(SuccessFE) as SuccessFE by UserAgent
|  eval SuccessRatioFE = (SuccessFE/total)*100
| chart bins=100 values(SuccessRatioFE) over _time by UserAgent

I get no results while if im running a table i see results
also, i cannot add the NOT filter I need inside the if statement :

NOT "Request.data.twoFactor.otp.expiresInMs"="*"

PickleRick · ‎10-05-2023

No. If function has a different syntax. See https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/ConditionalFunctions#if.28.26lt....

As for converting it to timechart, you can simply do

| timechart span=1h  count as total sum(SuccessFE) as SuccessFe by UserAgent

instead of stats

This will however give you a table which you'll have to firstly fillnull (because the sum might be empty) and then... hmm... probably use foreach to calculate ratio for each UserAgent separately. Or do untable and then stats. Probably the latter approach is easier.

ITWhisperer · ‎10-05-2023

The appendcols command does not correlate the fields, so while the initial search will have been ordered by UserAgent (then _time), the subsearch (from the appendcols) will probably still be in (reverse) _time order. Not only that, because of the extra filter on the search, there may be fewer events returned by the appendcols subsearch

sarit_s · ‎10-05-2023

ok, so what is your suggestion ? how can i achieve my goal ?

ITWhisperer · ‎10-05-2023

It depends on what your goal is - just presenting us with a bunch of SPL without a clear definition of what it is intended to do and without some sample events (anonymised of course) makes it challenging for volunteers to spend their time trying to figure out how to help you.

Why is the ratio calculation of two fields returning wrong results?

other

Join Us for Splunk University and Get Your Bootcamp Game On!

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

Announcing Scheduled Export GA for Dashboard Studio