We currently have our web filtering logs forwarded to Splunk. I have been asked to provide a report that doesn't just show the top users browsing the web, but to show a list of users that browse the web excessively. I have been fumbling around the percentile functions of stats but am having some trouble. An event is generated for every request that is made. Within the event is a field for "USER." I would like to determine the average number of events per user per day and report the top users that have breached a threshold based on this number. Maybe a user count that exceeds the 95th percentile or a user count that is 4X the average.
I thank you in advance for any help you can provide.
Or try this
yoursearchhere
| bucket _time span=1d
| stats count by USER _time
| eventstats p95(count) as topPercentile
| where count >= topPercentile
And, as Martin suggested, if you just want to report who exceeded the 95th percentile in the last day, just do this
yoursearchhere
| bucket _time span=1d
| stats count by USER _time
| eventstats p95(count) as topPercentile
| where count >= topPercentile and _time > relative_time(_time,"-1d@d")
As a simple search, add a | where _time = relative_time(now(), "-d@d")
to get yesterday's breachers.
In the long run, you should consider computing the daily average into a summary index. Use that to compute your 30-day moving average and compare to the relevant day, avoiding to run the full 30-day search every time just to get the average.
Thank you for your suggestion. When I run the search I see a set of users at or above the p95 for each day. Is it possible to just show the users from the last day while comparing their use against the average top percentile count over the last 30 days?
I apologize for not explaining my request better. I would like to determine what is the average daily count or the average top percentile over 30 days and then generate a daily report that shows the users that have breached this number/threshold.
Thank you again for your help.
Here's a rough untested thought, assuming the base search provides exactly one event per relevant web access.
base search | timechart span=1d count by user | eventstats avg(count) as avg | where count > 4*avg
Thank you. This gets me closer to what I am looking for.