How to combine top and bin in splunk query

JamesWierzba · ‎09-13-2023

I need to find abnormalities in my data. The data I have is individual views for certain movie titles. I need to find content that was abnormally popular over some small time interval, say 1 hour. And check a few weeks worth of data.

One option is to run a query manually for each hour

``` Run this over 60m time window ```
index=mydata 
| top limit=100 movieId

Obviously I don't want to run this query 24 * 7 = 168 times for one weeks worth of data.

How can I bin the data into time buckets, and get a percentage ratio by movieId? This is what I came up with:

``` Run this over 1+ week ```
index=mydata
| bin span=60m _time
| top limit=100 movieId, _time

This does not help me because the output of `top` is showing me a percentage based on the entire input set of data. I need a "local" percentage, i.e. a percentage based on only that slice of data in the bin.

I'm wondering if eventstats or streamstats can be useful here but I was not able to come up with a query using those commands

bowesmana · ‎09-13-2023

So rather than use top, which is somewhat useful, it is generally easier to get results using stats

index=mydata
| bin _time span=1h
| stats count by _time movieId
| eventstats sum(count) as total by _time
| eval percent=round(count/total*100, 2)

so this search will give you 1h buckets with counts by movieId then the eventstats will calculate the total per hour of all movies and then the percent calc will get the percentage of each movie within that hour.

However, although efficient, I expect what you may want is a streamstats variant, which will give you a sliding 60 minute window, so if your peak for a movie is from 20:30 to 21:30 this will show using streamstats, but not necessarily stats by 1h buckets

You could do something like this

index=mydata
| streamstats time_window=1h count as userCount by movieId
| streamstats time_window=1h count as totalCount
| eval percent=round(userCount/totalCount*100, 2)
| timechart span=1h max(percent) as maxPercent by movieId

which will show the max percent for all movies in 1h buckets, but with the time calculated as a sliding window

You can then test for thresholds or further manipulate your data - the timechart above is one way of looking at it, but you can do anything from there

How to combine top and bin in splunk query

stats

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

They're back! Join the SplunkTrust and MVP at .conf24

Enterprise Security Content Update (ESCU) | New Releases