Splunk Search

Limit values

ccsfdave
Builder

I have a search that pipes this:

stats count,values(category) by src_user src_ip

It returns results with a ton of categories per user and IP. I would like to limit the values(category) to the top 3 per result.

Does anyone know how to accomplish this?

Thanks,

Dave

Tags (1)
1 Solution

chris
Motivator

Try this:

| stats count by src_user,src_ip,category | sort -count | streamstats count as counter by src_user,src_ip  | stats sum(count) as total_count list(eval(if(counter<4,category,null()))) as values  by src_user,src_ip

Explanation

| stats count by src_user,src_ip,category 
-> you need the count of every category to find out the top 3 

| sort -count 
-> get the most frequent categorys first

| streamstats count as counter by src_user,src_ip
-> add a rank/counter for the categories by frequency

| stats sum(count) as total_count list(eval(if(counter<4,category,null()))) as values  by src_user,src_ip
->only take the categories into the final result that have a rank that is smaller than 4 ( = top 3)

----- Udpate -----

You could use either the ranking based or a percentage based "base search":

| stats count by src_user,src_ip,category | sort -count | streamstats count as counter by src_user,src_ip | where counter ❤️ 

| stats count by src_user,src_ip,category | eventstats sum(count) as total_hits by src_user,src_ip | eval percentage=count/total_hits | where percentage>0.33 

And the filter for categories you are interested in. If you know the interesting categories you could append:

| search category=xy OR category=yz

or maybe

| search category!=xy AND category!=yz

If you want to include/exclude the categories based on the frequency they occur you could have a seperate search that populates a lookup with the categories you want to include/exclude (this would be a kind of baseline) and then use that to filter your results. But from what you wrote I am guessing that you know the NSFW categories

View solution in original post

chris
Motivator

Try this:

| stats count by src_user,src_ip,category | sort -count | streamstats count as counter by src_user,src_ip  | stats sum(count) as total_count list(eval(if(counter<4,category,null()))) as values  by src_user,src_ip

Explanation

| stats count by src_user,src_ip,category 
-> you need the count of every category to find out the top 3 

| sort -count 
-> get the most frequent categorys first

| streamstats count as counter by src_user,src_ip
-> add a rank/counter for the categories by frequency

| stats sum(count) as total_count list(eval(if(counter<4,category,null()))) as values  by src_user,src_ip
->only take the categories into the final result that have a rank that is smaller than 4 ( = top 3)

----- Udpate -----

You could use either the ranking based or a percentage based "base search":

| stats count by src_user,src_ip,category | sort -count | streamstats count as counter by src_user,src_ip | where counter ❤️ 

| stats count by src_user,src_ip,category | eventstats sum(count) as total_hits by src_user,src_ip | eval percentage=count/total_hits | where percentage>0.33 

And the filter for categories you are interested in. If you know the interesting categories you could append:

| search category=xy OR category=yz

or maybe

| search category!=xy AND category!=yz

If you want to include/exclude the categories based on the frequency they occur you could have a seperate search that populates a lookup with the categories you want to include/exclude (this would be a kind of baseline) and then use that to filter your results. But from what you wrote I am guessing that you know the NSFW categories

ccsfdave
Builder

Yeah, it helped Chris! I just had to wrap my head around how it would work out. I will convey this to my customer and hopefully, it is what they are looking for.

0 Karma

chris
Motivator

I updated the answer, let me know if the update helps

0 Karma

chris
Motivator

Hi Dave, I do not understand what information the final result should contain yet. You could edit the the if clause of the second stats to exclude/include categories if you know what they are:
| stats sum(count) as total_count list(eval(if((counter<4) AND (category!="search-engines" OR category!="business-and-economy"),category,null()))) as values by src_user,src_ip
That way you still get a result per User & IP with the overall total count of page counts.

0 Karma

ccsfdave
Builder

Hi Chris,

I don't want to reverse the sort. What I am really looking for is aside from people doing their normal jobs, I would like to see if sports or dating sites are being accessed in high page counts. So I would like to exclude the above categories but still preserve whether they are frequenting NSFW (really not acceptable) sites.

Perhaps another way to go about this is to create a percent of sport versus overall page views. However for that to work, I would only want to see when sports was above say 33%.

Thanks for helping me think this through!

Dave

0 Karma

chris
Motivator

Have you tried changeing the "| sort -count" to "|sort count"?

0 Karma

ccsfdave
Builder

Can we take this one step further and then limit the category results to only report if not certain values? For example, my most common results are computer-and-internet-info, business-and-economy, search-engines. I would like the results to ignore the most common and only return the results with rare values at the top such as the sports category.

Any ideas? Thanks!

Dave

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...