Splunk Search

limit top to return percentage of total distinct count of a field

dadi
Path Finder

Hi guys,
I've the following problem: in my system there are events of users, and I want to get only the top 10% of the active users who appears in the logs. each log entry with the user Id counts and I need only the top 10% of the users who have the most log count.

My direction is to do it with top and limit, but I'm not sure how to calculate the value of limit:
search event=active | top userId limit=?

limit should be the 10% of total unique users ( ... | stats dc(UserId) ), but I'm not sure ho to calculate the total number of unique users in the same search query.

Any suggestions? Any idea is welcome.

Tags (3)
1 Solution

lguinn2
Legend

Don't use top - use the stats command instead

event=active
| stats count by userId
| eventstats perc90(count) as percentileLimit
| where count >= percentileLimit

The eventstats command calculates the 90th percentile of the counts by userId. Then the where command eliminates any results that have a lower count.

But now I think I might have misunderstood the question - if you have 150 users, do you only want to see the top 15? Okay, try this

event=active
| stats count by userId
| eventstats count as userIdCount
| sort -count
| eval i=1
| accum i as indexField
| eval  top10Percent=round(userIdCount / 10)
| where indexField <= top10Percent
| table indexField userId count

That should work. Let me know if it doesn't...

View solution in original post

lguinn2
Legend

Don't use top - use the stats command instead

event=active
| stats count by userId
| eventstats perc90(count) as percentileLimit
| where count >= percentileLimit

The eventstats command calculates the 90th percentile of the counts by userId. Then the where command eliminates any results that have a lower count.

But now I think I might have misunderstood the question - if you have 150 users, do you only want to see the top 15? Okay, try this

event=active
| stats count by userId
| eventstats count as userIdCount
| sort -count
| eval i=1
| accum i as indexField
| eval  top10Percent=round(userIdCount / 10)
| where indexField <= top10Percent
| table indexField userId count

That should work. Let me know if it doesn't...

dadi
Path Finder

Your second query worked like a charm. I might have learned a thing or two.
Thanks a lot!

0 Karma
Get Updates on the Splunk Community!

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...

Splunk APM: New Product Features + Community Office Hours Recap!

Howdy Splunk Community! Over the past few months, we’ve had a lot going on in the world of Splunk Application ...

Index This | Forward, I’m heavy; backward, I’m not. What am I?

April 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...