Splunk Search

limit top to return percentage of total distinct count of a field

dadi
Path Finder

Hi guys,
I've the following problem: in my system there are events of users, and I want to get only the top 10% of the active users who appears in the logs. each log entry with the user Id counts and I need only the top 10% of the users who have the most log count.

My direction is to do it with top and limit, but I'm not sure how to calculate the value of limit:
search event=active | top userId limit=?

limit should be the 10% of total unique users ( ... | stats dc(UserId) ), but I'm not sure ho to calculate the total number of unique users in the same search query.

Any suggestions? Any idea is welcome.

Tags (3)
1 Solution

lguinn2
Legend

Don't use top - use the stats command instead

event=active
| stats count by userId
| eventstats perc90(count) as percentileLimit
| where count >= percentileLimit

The eventstats command calculates the 90th percentile of the counts by userId. Then the where command eliminates any results that have a lower count.

But now I think I might have misunderstood the question - if you have 150 users, do you only want to see the top 15? Okay, try this

event=active
| stats count by userId
| eventstats count as userIdCount
| sort -count
| eval i=1
| accum i as indexField
| eval  top10Percent=round(userIdCount / 10)
| where indexField <= top10Percent
| table indexField userId count

That should work. Let me know if it doesn't...

View solution in original post

lguinn2
Legend

Don't use top - use the stats command instead

event=active
| stats count by userId
| eventstats perc90(count) as percentileLimit
| where count >= percentileLimit

The eventstats command calculates the 90th percentile of the counts by userId. Then the where command eliminates any results that have a lower count.

But now I think I might have misunderstood the question - if you have 150 users, do you only want to see the top 15? Okay, try this

event=active
| stats count by userId
| eventstats count as userIdCount
| sort -count
| eval i=1
| accum i as indexField
| eval  top10Percent=round(userIdCount / 10)
| where indexField <= top10Percent
| table indexField userId count

That should work. Let me know if it doesn't...

dadi
Path Finder

Your second query worked like a charm. I might have learned a thing or two.
Thanks a lot!

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...