Splunk Search

Select the Top 1 from a set of TopN

lpolo
Motivator

Using the Splunk query language how would be a splunk query that returns the Top 1 from a set of Top N?

Data set sample:

time               Term         count
2014-03-28 10:00   hello        10
2014-03-28 10:00   ciao          9 
2014-03-28 10:00   nice          7
2014-03-28 11:00   nice         11
2014-03-28 11:00   great         8 
2014-03-28 11:00   precise       6
2014-03-28 12:00   yougotit      6
2014-03-28 12:00   ok            4 
2014-03-28 12:00   thanks        3

The splunk query should return the top 1 of each Top N set. Example:

time               Term         count
2014-03-28 10:00   hello        10
2014-03-28 11:00   nice         11
2014-03-28 12:00   yougotit      6

Thanks,
Lp

My solution:

After reading the suggestions provided in below answers,I took the following approach:

1) Create a summary index.

2) Create an hourly schedule search to get the Top N and store the results in the summary index. Splunk query:

index="my_raw_index" |eval time=strftime(_time, "%m/%d/%Y:%H:%M") |
top limit=0 term by time|streamstats count as rank|table time term count

Result set:

time              rank  Term         count
2014-03-28 10:00   1    hello        10
2014-03-28 10:00   2    nice         11
2014-03-28 10:00   3    yougotit      6

3) Then, by using the rank field, it is quite simple to get the Top 1 from the set of Top N result set from the summary index. Query example:

index=my_summary_index rank=1|table time Term count.

I think this approach would scale quite well.

Thanks,
Lp

Tags (1)
0 Karma

Runals
Motivator

What about

... | top 1 count by Term
0 Karma

melonman
Motivator

I use this one for hourly ranking, just to share:
I am assuming you have multiple same terms in an hour so I have "stats max(count) by ..." , but in other case, please change it to fit your need...

index="mine" filter_event
| bucket _time span=1h
| stats max(count) as count by term _time
| sort - count
| eval rank=1
| streamstats sum(rank) as rank by _time
| where rank<4
| xyseries _time rank term

This gives you top 3 for each hour.
Change where rank<4 to rank=1 or so to fit your need... and see how it goes...

0 Karma

somesoni2
Revered Legend

Try this:

Your base search | sort _time, -count | streamstats count as sno by _time | where sno <2

this gives top 1 from each hour.

MuS
Legend

or you take both examples and combine them like this run everywhere example:

index=_internal | bucket _time span=1h | eval myTime=_time | stats max(kbps) as max by series, myTime | sort - myTime, max | dedup myTime, max | eval myTime=strftime(myTime, "%F %T")

this will give you the highest thruput per hour per series. You have to adapt it to match your needs.

cheers, MuS

0 Karma

wpreston
Motivator

maybe this?

... your search ... | stats max(count) as count by Term

linu1988
Champion

your search|sort - time,count|dedup time,count

Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...