Earn the most Karma on Answers for the month of July or August and win a free pass to Splunk .conf2015! For legal details, see Official Rules.

I have a search which returns the result as frequency table:

<code> uploads frequency 0 6 1 4 2 1 5 1 </code>

Basically, 6 users have uploaded 0 times, 4 users uploaded 1 time, and so on.

Now I want to compute stats such as the mean, median, and mode. Is there an easier way to compute stats directly on a frequency table such as this? (ie mean should be 0.916666, and mode is 0) This seems to be something that is required quite often, yet I haven't found a way to do it.

I understand there's way to calculate stats on the data before I turn it into a frequency table, the problem is that the query is rather complex (because in order to include the `upload == 0`

case, I had to do a lot of extra work). In other words, the underlying logs have no events when there are no uploads, but to calculate accurate statistics, the 0 events must be taken into consideration.

Comment

Up to 2 attachments (including images) can be used with a maximum of 524288 each and 1048576 total.

Accepted Answer

If you're frequency table is generated by you running `stats count`

, and then appending the other items, you can do it most efficiently with:

```
source=mysourcedata
| sistats mean(uploads) median(uploads) mode(uploads) by uploads
| append
[ stats count as psrsvd_sm_uploads
| eval psrsvd_ct_uploads= <number of entries with 0 uploads>
| eval psrsvd_gc=psrsvf_ct_uploads
| eval psrsvd_nc_uploads=psrsvd_ct_uploads
| eval psrsvd_rd_uploads="0e+00"+tostring(psrsvd_ct_uploads)+";"
| eval psrsvd_sm_uploads=0
| eval psrsvd_ss_uploads=0
| eval psrsvd_v=1
| eval psrsvd_vm_uploads="0;"+tostring(psrsvd_ct_uploads)+";"
| eval psrsvd_vt_uploads=0
| eval uploads=0
| stats mean(uploads) median(uploads) mode(uploads)
```

or, to get back the above frequency table:

```
source=mysourcedata
| sistats mean(uploads) median(uploads) mode(uploads) by uploads
| append
[ stats count as psrsvd_sm_uploads
| eval psrsvd_ct_uploads= <number of entries with 0 uploads>
| eval psrsvd_gc=psrsvf_ct_uploads
| eval psrsvd_nc_uploads=psrsvd_ct_uploads
| eval psrsvd_rd_uploads="0e+00"+tostring(psrsvd_ct_uploads)+";"
| eval psrsvd_sm_uploads=0
| eval psrsvd_ss_uploads=0
| eval psrsvd_v=1
| eval psrsvd_vm_uploads="0;"+tostring(psrsvd_ct_uploads)+";"
| eval psrsvd_vt_uploads=0
| eval uploads=0
| stats count as frequency by uploads
```

if you want, you can generate the subsearch into a file with outputcsv or outputlookup, then fetch it back in and append it.

Up to 2 attachments (including images) can be used with a maximum of 524288 each and 1048576 total.

*that* hard, but left as an exercise.

No easy way to compute all of these in a single simple query.

mean:

```
... | stats
sum(uploads) as totaluploads
sum(frequency) as totalcount
| eval mean=totaluploads/totalcount
```

mode:

```
... | sort - frequency | head 1 | eval mode=uploads
```

median:

```
... | sort uploads
| eventstats sum(frequency) as totalcount | eval medianrank=floor((totalcount/2))
| streamstats global=f current=t window=0
sum(frequency) as maxrank
| where (maxrank-frequency <= medianrank) AND (medianrank < frequency)
| eval median = uploads
```

trickery to combine them:

```
... | sort - frequency
| eventstats first(uploads) as mode
| sort uploads
| eventstats
sum(frequency) as totalcount
sum(uploads) as totaluploads
| eval mean=totaluploads/totalcount
| eval medianrank=floor((totalcount/2))
| streamstats current=t window=0
sum(frequency) as maxrank
| eval median=if((maxrank-frequency <= medianrank) AND (medianrank < frequency), uploads, null())
| where isnotnull(median)
```

Up to 2 attachments (including images) can be used with a maximum of 524288 each and 1048576 total.

Use this widget to see the actions stream for the question.

Get actions

**Tags:**

statssearch-languagetablefrequencyfrequency-table

**Asked:** Jun 28, 2011 at 02:55 PM

**Seen:** 5975 times

**Last updated:** Jun 28, '11

Get actions

statssearch-languagetablefrequencyfrequency-table

How to use stats group by _time without null fields? 0 Answers

Last event grouped by 1 Answer

Alert if value is greater than 2xSTDEV 1 Answer

iplocation and counting ips 1 Answer

Add a (sparkline) to a (table) 2 Answers

Copyright © 2005-2015 Splunk Inc. All rights reserved.

- Anonymous
- Sign in
- Create
- Ask a question
- Upload an App
- Explore
- Tags
- Answers
- Apps
- Users
- Badges

3 ● 1 ● 1 ● 4