Splunk Search

Select the Top 1 from a set of TopN

lpolo
Motivator

Using the Splunk query language how would be a splunk query that returns the Top 1 from a set of Top N?

Data set sample:

time               Term         count
2014-03-28 10:00   hello        10
2014-03-28 10:00   ciao          9 
2014-03-28 10:00   nice          7
2014-03-28 11:00   nice         11
2014-03-28 11:00   great         8 
2014-03-28 11:00   precise       6
2014-03-28 12:00   yougotit      6
2014-03-28 12:00   ok            4 
2014-03-28 12:00   thanks        3

The splunk query should return the top 1 of each Top N set. Example:

time               Term         count
2014-03-28 10:00   hello        10
2014-03-28 11:00   nice         11
2014-03-28 12:00   yougotit      6

Thanks,
Lp

My solution:

After reading the suggestions provided in below answers,I took the following approach:

1) Create a summary index.

2) Create an hourly schedule search to get the Top N and store the results in the summary index. Splunk query:

index="my_raw_index" |eval time=strftime(_time, "%m/%d/%Y:%H:%M") |
top limit=0 term by time|streamstats count as rank|table time term count

Result set:

time              rank  Term         count
2014-03-28 10:00   1    hello        10
2014-03-28 10:00   2    nice         11
2014-03-28 10:00   3    yougotit      6

3) Then, by using the rank field, it is quite simple to get the Top 1 from the set of Top N result set from the summary index. Query example:

index=my_summary_index rank=1|table time Term count.

I think this approach would scale quite well.

Thanks,
Lp

Tags (1)
0 Karma

Runals
Motivator

What about

... | top 1 count by Term
0 Karma

melonman
Motivator

I use this one for hourly ranking, just to share:
I am assuming you have multiple same terms in an hour so I have "stats max(count) by ..." , but in other case, please change it to fit your need...

index="mine" filter_event
| bucket _time span=1h
| stats max(count) as count by term _time
| sort - count
| eval rank=1
| streamstats sum(rank) as rank by _time
| where rank<4
| xyseries _time rank term

This gives you top 3 for each hour.
Change where rank<4 to rank=1 or so to fit your need... and see how it goes...

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Try this:

Your base search | sort _time, -count | streamstats count as sno by _time | where sno <2

this gives top 1 from each hour.

MuS
SplunkTrust
SplunkTrust

or you take both examples and combine them like this run everywhere example:

index=_internal | bucket _time span=1h | eval myTime=_time | stats max(kbps) as max by series, myTime | sort - myTime, max | dedup myTime, max | eval myTime=strftime(myTime, "%F %T")

this will give you the highest thruput per hour per series. You have to adapt it to match your needs.

cheers, MuS

0 Karma

wpreston
Motivator

maybe this?

... your search ... | stats max(count) as count by Term

linu1988
Champion

your search|sort - time,count|dedup time,count

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...