Splunk Search

average(eventcount) applied to transactions returns the wrong value sometimes

fere
Path Finder

I am comparing the results of the following two searches for one user id:

source="xxxx" | transaction user_id, pid keeporphans=f maxspan=70m maxpause=45m mvraw=t delim="," mvlist=t | stats avg(eventcount) avg(duration) by user_id

which returns the following for this user id: (the same for mean(eventcount)

     user_id                    avg(eventcoun     avg(duration)

4f7b35d0d93d056a5c000028 6.000000 2297.694808

And:

source="xxxx" | transaction user_id, pid keeporphans=f maxspan=70m maxpause=45m mvraw=t delim="," mvlist=t | search user_id="4f7b35d0d93d056a5c000028"

which displays the following info when I click on the eventcount field in the left column:

Min: 2 Max: 8 Mean: 4 Stdev: 3.098

Values # %

2 4 66.667%

8 2 33.333%

Based on the above data, the average for this user_id should be calculated to 4, not 6 which is returned by the first search query. avg(duration) has the same issue and is calulated too high by the first search query.
Any ideas what is going on here? how to fix this?

0 Karma
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

It does not return the wrong value. In each case, you are computing different averages (and stdevs, etc).

Because you specified mvlist=t in transaction, user_id was created as multi-valued field. The stats command operates on multi-valued group-by fields by treating them as if each value represented a separate event. However, eventcount only appears once in the data, and the "interesting fields" only displays its count and average of the entire number of resulting complete transactions. So in the first case, you have (probably) four transactions with two lines each (and two occurrences of user_id), and two transactions with eight lines each (and eight occurrences of the user_id). So, your average would be computed as (8x(8x2) + 2x(2x4))/(8x2 + 2x4) = 6. In the second case, you simply have 4 occurrences of 2, and 2 occurrences of 8, so the average is (2x4 + 8x2)/(4+2 = 4).

It is quite easy to see if you add count(eventcount) to your results. In that case, the stats command will return 24 items, while the "Interesting Fields" will show 6 transactions/events.

View solution in original post

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

It does not return the wrong value. In each case, you are computing different averages (and stdevs, etc).

Because you specified mvlist=t in transaction, user_id was created as multi-valued field. The stats command operates on multi-valued group-by fields by treating them as if each value represented a separate event. However, eventcount only appears once in the data, and the "interesting fields" only displays its count and average of the entire number of resulting complete transactions. So in the first case, you have (probably) four transactions with two lines each (and two occurrences of user_id), and two transactions with eight lines each (and eight occurrences of the user_id). So, your average would be computed as (8x(8x2) + 2x(2x4))/(8x2 + 2x4) = 6. In the second case, you simply have 4 occurrences of 2, and 2 occurrences of 8, so the average is (2x4 + 8x2)/(4+2 = 4).

It is quite easy to see if you add count(eventcount) to your results. In that case, the stats command will return 24 items, while the "Interesting Fields" will show 6 transactions/events.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...