Splunk Search

showperc and usother for stats/charts?

jankowsr
Path Finder

I'm wondering if there is any option to have a showperc and useother functionality in stats/charts? They seem to be available only in top command which (if I'm correct) can only aggregate by count of events.
So if I want to print simple statistics for top N talkers showing for example:

Source_IP transferred_bytes perc_of_transferred_bytes number_of_tcp_flows perc_of_tcp_flows
1:
2:
3:
.....
10:
Other:

then it becomes quite complex query.

The solution for calculating percents has been provided here: https://answers.splunk.com/answers/9660/stats-count-as-a-percentage-as-the-total.html

The question for calculating "Other" statistics has been raised here:
https://answers.splunk.com/answers/68744/stats-limiting-to-top-results-and-summarizing-the-rest.html
but not answered so far. If found a way of calculating "others" in another query and merging it with the usage of the appendpipe command.

Anyway as a result of all such combinations I have a huge, complex and hard to maintain query. showperc and useother options for stats/charts would make life so much easier. Any ideas?

0 Karma
1 Solution

lguinn2
Legend

I shortened the field names to make it easier to type, but I think this solution should work.

yoursearchhere
| stats sum(bytes) as xBytes, count as numFlows by src_ip
| eventstats sum(xBytes) as totalBytes, sum(numFlows) as totalFlows
| sort 10 -xBytes -numFlows
| appendpipe [ stats sum(xBytes) as topBytes, last(totalBytes) as totalBytes,
                     sum(numFlows) as topFlows, last(totalFlows) as totalFlows 
    | eval src_ip = "Other" 
    | eval xBytes = totalBytes - topBytes 
    | eval numFlows = totalFlows - topFlows ]
| where xBytes > 0
| eval percentBytes = round(xBytes * 100 / totalBytes, 1)
| eval percentFlows = round(numFlows * 100 / totalFlows, 1)
| table src_ip xBytes percentBytes numFlows percentFlows

This doesn't require subsearches. To change from a "top 10" to a "top 20," simply change the sort command. I did make it a little fancy, so that it drops the "Other" line if there are no "others" - for example, if you asked for a top 10, but there were only 9 source ips in the timerange.

View solution in original post

lguinn2
Legend

I shortened the field names to make it easier to type, but I think this solution should work.

yoursearchhere
| stats sum(bytes) as xBytes, count as numFlows by src_ip
| eventstats sum(xBytes) as totalBytes, sum(numFlows) as totalFlows
| sort 10 -xBytes -numFlows
| appendpipe [ stats sum(xBytes) as topBytes, last(totalBytes) as totalBytes,
                     sum(numFlows) as topFlows, last(totalFlows) as totalFlows 
    | eval src_ip = "Other" 
    | eval xBytes = totalBytes - topBytes 
    | eval numFlows = totalFlows - topFlows ]
| where xBytes > 0
| eval percentBytes = round(xBytes * 100 / totalBytes, 1)
| eval percentFlows = round(numFlows * 100 / totalFlows, 1)
| table src_ip xBytes percentBytes numFlows percentFlows

This doesn't require subsearches. To change from a "top 10" to a "top 20," simply change the sort command. I did make it a little fancy, so that it drops the "Other" line if there are no "others" - for example, if you asked for a top 10, but there were only 9 source ips in the timerange.

jankowsr
Path Finder

Thank you Iguinn, I hoped there will be something simpler that comes with stats/chart, so adding or changing column will not be necessary in 6 places. Anyway I still find it quite smart query

0 Karma

lguinn2
Legend

BTW, there is a "limit=N" option for the chart command, but there is no "useother" option.

A big problem with calculating the "top values" for a stats/chart is that there could be many values/sums/counts/fields in the results - which ones do you use and how do you calculate "top"? The top command computes a single counter, so it doesn't have this problem. I think this is why such an option doesn't exist today.

In your example, the bytes and the number of flows require two different initial calculations. This can probably never be simply resolved by an option.

0 Karma

jankowsr
Path Finder

Thank you for your response, Iguinn.
I'm fully aware of the fact that decision has to be made which column is used for sorting and selecting top values. Anyway since such decision is made having such stats with multiple aggregated columns out of the box and with the "others" row present would be to me a reasonable feature.
I don't want to propose syntax for it but I could imagine it wouldn't be something impossible.
I have just provided my tiny piece of feedback, obviously decision for the future is yours ;-).

0 Karma

lguinn2
Legend

Oh no, believe me, it's not my decision at all! I have much less impact on feature decisions than customers. And that is really how things should be.

Although I have been thinking about whether I could write a custom search command to do this, and how it might work... 🙂

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...