I have a search that pipes this:
stats count,values(category) by src_user src_ip
It returns results with a ton of categories per user and IP. I would like to limit the values(category) to the top 3 per result.
Does anyone know how to accomplish this?
Thanks,
Dave
Try this:
| stats count by src_user,src_ip,category | sort -count | streamstats count as counter by src_user,src_ip | stats sum(count) as total_count list(eval(if(counter<4,category,null()))) as values by src_user,src_ip
Explanation
| stats count by src_user,src_ip,category
-> you need the count of every category to find out the top 3
| sort -count
-> get the most frequent categorys first
| streamstats count as counter by src_user,src_ip
-> add a rank/counter for the categories by frequency
| stats sum(count) as total_count list(eval(if(counter<4,category,null()))) as values by src_user,src_ip
->only take the categories into the final result that have a rank that is smaller than 4 ( = top 3)
----- Udpate -----
You could use either the ranking based or a percentage based "base search":
| stats count by src_user,src_ip,category | sort -count | streamstats count as counter by src_user,src_ip | where counter ❤️
| stats count by src_user,src_ip,category | eventstats sum(count) as total_hits by src_user,src_ip | eval percentage=count/total_hits | where percentage>0.33
And the filter for categories you are interested in. If you know the interesting categories you could append:
| search category=xy OR category=yz
or maybe
| search category!=xy AND category!=yz
If you want to include/exclude the categories based on the frequency they occur you could have a seperate search that populates a lookup with the categories you want to include/exclude (this would be a kind of baseline) and then use that to filter your results. But from what you wrote I am guessing that you know the NSFW categories
Try this:
| stats count by src_user,src_ip,category | sort -count | streamstats count as counter by src_user,src_ip | stats sum(count) as total_count list(eval(if(counter<4,category,null()))) as values by src_user,src_ip
Explanation
| stats count by src_user,src_ip,category
-> you need the count of every category to find out the top 3
| sort -count
-> get the most frequent categorys first
| streamstats count as counter by src_user,src_ip
-> add a rank/counter for the categories by frequency
| stats sum(count) as total_count list(eval(if(counter<4,category,null()))) as values by src_user,src_ip
->only take the categories into the final result that have a rank that is smaller than 4 ( = top 3)
----- Udpate -----
You could use either the ranking based or a percentage based "base search":
| stats count by src_user,src_ip,category | sort -count | streamstats count as counter by src_user,src_ip | where counter ❤️
| stats count by src_user,src_ip,category | eventstats sum(count) as total_hits by src_user,src_ip | eval percentage=count/total_hits | where percentage>0.33
And the filter for categories you are interested in. If you know the interesting categories you could append:
| search category=xy OR category=yz
or maybe
| search category!=xy AND category!=yz
If you want to include/exclude the categories based on the frequency they occur you could have a seperate search that populates a lookup with the categories you want to include/exclude (this would be a kind of baseline) and then use that to filter your results. But from what you wrote I am guessing that you know the NSFW categories
Yeah, it helped Chris! I just had to wrap my head around how it would work out. I will convey this to my customer and hopefully, it is what they are looking for.
I updated the answer, let me know if the update helps
Hi Dave, I do not understand what information the final result should contain yet. You could edit the the if clause of the second stats to exclude/include categories if you know what they are:
| stats sum(count) as total_count list(eval(if((counter<4) AND (category!="search-engines" OR category!="business-and-economy"),category,null()))) as values by src_user,src_ip
That way you still get a result per User & IP with the overall total count of page counts.
Hi Chris,
I don't want to reverse the sort. What I am really looking for is aside from people doing their normal jobs, I would like to see if sports or dating sites are being accessed in high page counts. So I would like to exclude the above categories but still preserve whether they are frequenting NSFW (really not acceptable) sites.
Perhaps another way to go about this is to create a percent of sport versus overall page views. However for that to work, I would only want to see when sports was above say 33%.
Thanks for helping me think this through!
Dave
Have you tried changeing the "| sort -count" to "|sort count"?
Can we take this one step further and then limit the category results to only report if not certain values? For example, my most common results are computer-and-internet-info, business-and-economy, search-engines. I would like the results to ignore the most common and only return the results with rare values at the top such as the sports category.
Any ideas? Thanks!
Dave