Select the Top 1 from a set of TopN

lpolo · ‎03-28-2014

Using the Splunk query language how would be a splunk query that returns the Top 1 from a set of Top N?

Data set sample:

time               Term         count
2014-03-28 10:00   hello        10
2014-03-28 10:00   ciao          9 
2014-03-28 10:00   nice          7
2014-03-28 11:00   nice         11
2014-03-28 11:00   great         8 
2014-03-28 11:00   precise       6
2014-03-28 12:00   yougotit      6
2014-03-28 12:00   ok            4 
2014-03-28 12:00   thanks        3

The splunk query should return the top 1 of each Top N set. Example:

time               Term         count
2014-03-28 10:00   hello        10
2014-03-28 11:00   nice         11
2014-03-28 12:00   yougotit      6

Thanks,
Lp

My solution:

After reading the suggestions provided in below answers,I took the following approach:

1) Create a summary index.

2) Create an hourly schedule search to get the Top N and store the results in the summary index. Splunk query:

index="my_raw_index" |eval time=strftime(_time, "%m/%d/%Y:%H:%M") |
top limit=0 term by time|streamstats count as rank|table time term count

Result set:

time              rank  Term         count
2014-03-28 10:00   1    hello        10
2014-03-28 10:00   2    nice         11
2014-03-28 10:00   3    yougotit      6

3) Then, by using the rank field, it is quite simple to get the Top 1 from the set of Top N result set from the summary index. Query example:

index=my_summary_index rank=1|table time Term count.

I think this approach would scale quite well.

Thanks,
Lp

Runals · ‎03-29-2014

What about

... | top 1 count by Term

melonman · ‎03-28-2014

I use this one for hourly ranking, just to share:
I am assuming you have multiple same terms in an hour so I have "stats max(count) by ..." , but in other case, please change it to fit your need...

This gives you top 3 for each hour.
Change where rank<4 to rank=1 or so to fit your need... and see how it goes...

somesoni2 · ‎03-28-2014

Try this:

Your base search | sort _time, -count | streamstats count as sno by _time | where sno <2

this gives top 1 from each hour.

MuS · ‎03-28-2014

or you take both examples and combine them like this run everywhere example:

index=_internal | bucket _time span=1h | eval myTime=_time | stats max(kbps) as max by series, myTime | sort - myTime, max | dedup myTime, max | eval myTime=strftime(myTime, "%F %T")

this will give you the highest thruput per hour per series. You have to adapt it to match your needs.

cheers, MuS

wpreston · ‎03-28-2014

maybe this?

... your search ... | stats max(count) as count by Term

linu1988 · ‎03-28-2014

your search|sort - time,count|dedup time,count

Select the Top 1 from a set of TopN

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes

Welcome to the Splunk Community!