Splunk Search

How to combine top and bin in splunk query

JamesWierzba
Observer

I need to find abnormalities in my data. The data I have is individual views for certain movie titles. I need to find content that was abnormally popular over some small time interval, say 1 hour. And check a few weeks worth of data.

One option is to run a query manually for each hour

 

 

 

``` Run this over 60m time window ```
index=mydata 
| top limit=100 movieId

 

 

 

Obviously I don't want to run this query 24 * 7 = 168 times for one weeks worth of data.

How can I bin the data into time buckets, and get a percentage ratio by movieId? This is what I came up with:

 

 

 

``` Run this over 1+ week ```
index=mydata
| bin span=60m _time
| top limit=100 movieId, _time

 

 

 

This does not help me because the output of `top` is showing me a percentage based on the entire input set of data. I need a "local" percentage, i.e. a percentage based on only that slice of data in the bin.

 

I'm wondering if eventstats or streamstats can be useful here but I was not able to come up with a query using those commands

Labels (1)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

So rather than use top, which is somewhat useful, it is generally easier to get results using stats

index=mydata
| bin _time span=1h
| stats count by _time movieId
| eventstats sum(count) as total by _time
| eval percent=round(count/total*100, 2)

so this search will give you 1h buckets with counts by movieId then the eventstats will calculate the total per hour of all movies and then the percent calc will get the percentage of each movie within that hour.

However, although efficient, I expect what you may want is a streamstats variant, which will give you a sliding 60 minute window, so if your peak for a movie is from 20:30 to 21:30 this will show using streamstats, but not necessarily stats by 1h buckets

You could do something like this

index=mydata
| streamstats time_window=1h count as userCount by movieId
| streamstats time_window=1h count as totalCount
| eval percent=round(userCount/totalCount*100, 2)
| timechart span=1h max(percent) as maxPercent by movieId

which will show the max percent for all movies in 1h buckets, but with the time calculated as a sliding window

You can then test for thresholds or further manipulate your data - the timechart above is one way of looking at it, but you can do anything from there

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...