Solved: showperc and usother for stats/charts?

jankowsr · ‎10-21-2016

I'm wondering if there is any option to have a showperc and useother functionality in stats/charts? They seem to be available only in top command which (if I'm correct) can only aggregate by count of events.
So if I want to print simple statistics for top N talkers showing for example:

Source_IP transferred_bytes perc_of_transferred_bytes number_of_tcp_flows perc_of_tcp_flows
1:
2:
3:
.....
10:
Other:

then it becomes quite complex query.

The solution for calculating percents has been provided here: https://answers.splunk.com/answers/9660/stats-count-as-a-percentage-as-the-total.html

The question for calculating "Other" statistics has been raised here:
https://answers.splunk.com/answers/68744/stats-limiting-to-top-results-and-summarizing-the-rest.html
but not answered so far. If found a way of calculating "others" in another query and merging it with the usage of the appendpipe command.

Anyway as a result of all such combinations I have a huge, complex and hard to maintain query. showperc and useother options for stats/charts would make life so much easier. Any ideas?

lguinn2 · ‎10-22-2016

I shortened the field names to make it easier to type, but I think this solution should work.

yoursearchhere
| stats sum(bytes) as xBytes, count as numFlows by src_ip
| eventstats sum(xBytes) as totalBytes, sum(numFlows) as totalFlows
| sort 10 -xBytes -numFlows
| appendpipe [ stats sum(xBytes) as topBytes, last(totalBytes) as totalBytes,
                     sum(numFlows) as topFlows, last(totalFlows) as totalFlows 
    | eval src_ip = "Other" 
    | eval xBytes = totalBytes - topBytes 
    | eval numFlows = totalFlows - topFlows ]
| where xBytes > 0
| eval percentBytes = round(xBytes * 100 / totalBytes, 1)
| eval percentFlows = round(numFlows * 100 / totalFlows, 1)
| table src_ip xBytes percentBytes numFlows percentFlows

This doesn't require subsearches. To change from a "top 10" to a "top 20," simply change the sort command. I did make it a little fancy, so that it drops the "Other" line if there are no "others" - for example, if you asked for a top 10, but there were only 9 source ips in the timerange.

View solution in original post

lguinn2 · ‎10-22-2016

I shortened the field names to make it easier to type, but I think this solution should work.

yoursearchhere
| stats sum(bytes) as xBytes, count as numFlows by src_ip
| eventstats sum(xBytes) as totalBytes, sum(numFlows) as totalFlows
| sort 10 -xBytes -numFlows
| appendpipe [ stats sum(xBytes) as topBytes, last(totalBytes) as totalBytes,
                     sum(numFlows) as topFlows, last(totalFlows) as totalFlows 
    | eval src_ip = "Other" 
    | eval xBytes = totalBytes - topBytes 
    | eval numFlows = totalFlows - topFlows ]
| where xBytes > 0
| eval percentBytes = round(xBytes * 100 / totalBytes, 1)
| eval percentFlows = round(numFlows * 100 / totalFlows, 1)
| table src_ip xBytes percentBytes numFlows percentFlows

This doesn't require subsearches. To change from a "top 10" to a "top 20," simply change the sort command. I did make it a little fancy, so that it drops the "Other" line if there are no "others" - for example, if you asked for a top 10, but there were only 9 source ips in the timerange.

jankowsr · ‎10-22-2016

Thank you Iguinn, I hoped there will be something simpler that comes with stats/chart, so adding or changing column will not be necessary in 6 places. Anyway I still find it quite smart query

lguinn2 · ‎10-24-2016

BTW, there is a "limit=N" option for the chart command, but there is no "useother" option.

A big problem with calculating the "top values" for a stats/chart is that there could be many values/sums/counts/fields in the results - which ones do you use and how do you calculate "top"? The top command computes a single counter, so it doesn't have this problem. I think this is why such an option doesn't exist today.

In your example, the bytes and the number of flows require two different initial calculations. This can probably never be simply resolved by an option.

jankowsr · ‎10-24-2016

Thank you for your response, Iguinn.
I'm fully aware of the fact that decision has to be made which column is used for sorting and selecting top values. Anyway since such decision is made having such stats with multiple aggregated columns out of the box and with the "others" row present would be to me a reasonable feature.
I don't want to propose syntax for it but I could imagine it wouldn't be something impossible.
I have just provided my tiny piece of feedback, obviously decision for the future is yours ;-).

lguinn2 · ‎10-25-2016

Oh no, believe me, it's not my decision at all! I have much less impact on feature decisions than customers. And that is really how things should be.

Although I have been thinking about whether I could write a custom search command to do this, and how it might work... 🙂

showperc and usother for stats/charts?

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life