Using the Splunk query language how would be a splunk query that returns the Top 1 from a set of Top N?
Data set sample:
time Term count
2014-03-28 10:00 hello 10
2014-03-28 10:00 ciao 9
2014-03-28 10:00 nice 7
2014-03-28 11:00 nice 11
2014-03-28 11:00 great 8
2014-03-28 11:00 precise 6
2014-03-28 12:00 yougotit 6
2014-03-28 12:00 ok 4
2014-03-28 12:00 thanks 3
The splunk query should return the top 1 of each Top N set. Example:
time Term count
2014-03-28 10:00 hello 10
2014-03-28 11:00 nice 11
2014-03-28 12:00 yougotit 6
Thanks,
Lp
My solution:
After reading the suggestions provided in below answers,I took the following approach:
1) Create a summary index.
2) Create an hourly schedule search to get the Top N and store the results in the summary index. Splunk query:
index="my_raw_index" |eval time=strftime(_time, "%m/%d/%Y:%H:%M") |
top limit=0 term by time|streamstats count as rank|table time term count
Result set:
time rank Term count
2014-03-28 10:00 1 hello 10
2014-03-28 10:00 2 nice 11
2014-03-28 10:00 3 yougotit 6
3) Then, by using the rank field, it is quite simple to get the Top 1 from the set of Top N result set from the summary index. Query example:
index=my_summary_index rank=1|table time Term count.
I think this approach would scale quite well.
Thanks,
Lp
What about
... | top 1 count by Term
I use this one for hourly ranking, just to share:
I am assuming you have multiple same terms in an hour so I have "stats max(count) by ..." , but in other case, please change it to fit your need...
index="mine" filter_event
| bucket _time span=1h
| stats max(count) as count by term _time
| sort - count
| eval rank=1
| streamstats sum(rank) as rank by _time
| where rank<4
| xyseries _time rank term
This gives you top 3 for each hour.
Change where rank<4 to rank=1 or so to fit your need... and see how it goes...
Try this:
Your base search | sort _time, -count | streamstats count as sno by _time | where sno <2
this gives top 1 from each hour.
or you take both examples and combine them like this run everywhere example
:
index=_internal | bucket _time span=1h | eval myTime=_time | stats max(kbps) as max by series, myTime | sort - myTime, max | dedup myTime, max | eval myTime=strftime(myTime, "%F %T")
this will give you the highest thruput per hour per series
. You have to adapt it to match your needs.
cheers, MuS
maybe this?
... your search ... | stats max(count) as count by Term
your search|sort - time,count|dedup time,count