Splunk Search

limit top to return percentage of total distinct count of a field

dadi
Path Finder

Hi guys,
I've the following problem: in my system there are events of users, and I want to get only the top 10% of the active users who appears in the logs. each log entry with the user Id counts and I need only the top 10% of the users who have the most log count.

My direction is to do it with top and limit, but I'm not sure how to calculate the value of limit:
search event=active | top userId limit=?

limit should be the 10% of total unique users ( ... | stats dc(UserId) ), but I'm not sure ho to calculate the total number of unique users in the same search query.

Any suggestions? Any idea is welcome.

Tags (3)
1 Solution

lguinn2
Legend

Don't use top - use the stats command instead

event=active
| stats count by userId
| eventstats perc90(count) as percentileLimit
| where count >= percentileLimit

The eventstats command calculates the 90th percentile of the counts by userId. Then the where command eliminates any results that have a lower count.

But now I think I might have misunderstood the question - if you have 150 users, do you only want to see the top 15? Okay, try this

event=active
| stats count by userId
| eventstats count as userIdCount
| sort -count
| eval i=1
| accum i as indexField
| eval  top10Percent=round(userIdCount / 10)
| where indexField <= top10Percent
| table indexField userId count

That should work. Let me know if it doesn't...

View solution in original post

lguinn2
Legend

Don't use top - use the stats command instead

event=active
| stats count by userId
| eventstats perc90(count) as percentileLimit
| where count >= percentileLimit

The eventstats command calculates the 90th percentile of the counts by userId. Then the where command eliminates any results that have a lower count.

But now I think I might have misunderstood the question - if you have 150 users, do you only want to see the top 15? Okay, try this

event=active
| stats count by userId
| eventstats count as userIdCount
| sort -count
| eval i=1
| accum i as indexField
| eval  top10Percent=round(userIdCount / 10)
| where indexField <= top10Percent
| table indexField userId count

That should work. Let me know if it doesn't...

dadi
Path Finder

Your second query worked like a charm. I might have learned a thing or two.
Thanks a lot!

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...