I need to find abnormalities in my data. The data I have is individual views for certain movie titles. I need to find content that was abnormally popular over some small time interval, say 1 hour. And check a few weeks worth of data.
One option is to run a query manually for each hour
``` Run this over 60m time window ```
index=mydata
| top limit=100 movieId
Obviously I don't want to run this query 24 * 7 = 168 times for one weeks worth of data.
How can I bin the data into time buckets, and get a percentage ratio by movieId? This is what I came up with:
``` Run this over 1+ week ```
index=mydata
| bin span=60m _time
| top limit=100 movieId, _time
This does not help me because the output of `top` is showing me a percentage based on the entire input set of data. I need a "local" percentage, i.e. a percentage based on only that slice of data in the bin.
I'm wondering if eventstats or streamstats can be useful here but I was not able to come up with a query using those commands
So rather than use top, which is somewhat useful, it is generally easier to get results using stats
index=mydata
| bin _time span=1h
| stats count by _time movieId
| eventstats sum(count) as total by _time
| eval percent=round(count/total*100, 2)
so this search will give you 1h buckets with counts by movieId then the eventstats will calculate the total per hour of all movies and then the percent calc will get the percentage of each movie within that hour.
However, although efficient, I expect what you may want is a streamstats variant, which will give you a sliding 60 minute window, so if your peak for a movie is from 20:30 to 21:30 this will show using streamstats, but not necessarily stats by 1h buckets
You could do something like this
index=mydata
| streamstats time_window=1h count as userCount by movieId
| streamstats time_window=1h count as totalCount
| eval percent=round(userCount/totalCount*100, 2)
| timechart span=1h max(percent) as maxPercent by movieId
which will show the max percent for all movies in 1h buckets, but with the time calculated as a sliding window
You can then test for thresholds or further manipulate your data - the timechart above is one way of looking at it, but you can do anything from there