Splunk Search

How can I top results from a search when using list(field)

tkwaller
Builder

Heres my current search:

index=akamai src_ip!=xxx.xx.xx.xx AND src_ip!=xxx.xx.xx.xx | lookup whitelistip.csv src_ip OUTPUTNEW src_ip as Whitelist |  where isnull(Whitelist) | eval useragent=urldecode(useragent) |  dedup useragent | stats dc(http_user_agent) AS Number_of_userAgents, list(useragent) as useragent_list, count(eval(uri_path="/search/stuff/v2")) as "SearchV2", count(eval(uri_path="shape/search/stuff/v2")) as "ShapeV2", count(eval(uri_path="/search/stuff/v1")) as "SearchV1", count as EventCount by src_ip |  eval TotalCount=SearchV2+ShapeV2+SearchV1 | eval hitPercentage=(TotalCount/EventCount) * 100  | fields src_ip, Number_of_userAgents,useragent_list, hitPercentage, EventCount, SearchV2, ShapeV2, SearchV1, TotalCount 

This search give me reults such as:

src_ip  Number_of_userAgents    useragent_list  hitPercentage   EventCount  SearchV2    ShapeV2 SearchV1    TotalCount
XXX.XXX.XXX.XX1 10  
Mozilla/5.0 (iPad; CPU OS 9_1 like Mac OS X) AppleWebKit/601.1.56 (KHTML, like Gecko) Version/9.0 Mobile/13A452 Sarafi/601.1.4
Mozilla/5.0 (iPad; CPU OS 9_0_2 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/9.0 Mobile/13A404 Sarafi/601.1.46
Mozilla/5.0 (iPad; CPU OS 9_1 like Mac OS X) AppleWebKit/601.1.56 (KHTML, like Gecko) Version/9.0 Mobile/13A404 Sarafi/601.1.56
Mozilla/5.0 (iPad; CPU OS 9_0_2 like Mac OS X) AppleWebKit/601.1.17 (KHTML, like Gecko) Version/9.0 Mobile/13A452 Sarafi/601.1.46
Mozilla/5.0 (iPad; CPU OS 9_0 like Mac OS X) AppleWebKit/601.1.39 (KHTML, like Gecko) Version/9.0 Mobile/13D15 Sarafi/601.1
Mozilla/5.0 (iPad; CPU OS 9_0 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Sarafi/600.1.4
Mozilla/5.0 (iPad; CPU OS 9_3 like Mac OS X) AppleWebKit/601.1.17 (KHTML, like Gecko) Version/9.0 Mobile/13C75 Sarafi/601.1.46
Mozilla/5.0 (iPad; CPU OS 9_0 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13E238 Sarafi/601.1.56
Mozilla/5.0 (iPad; CPU OS 9_0_2 like Mac OS X) AppleWebKit/601.1.17 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Sarafi/601.1
Mozilla/5.0 (iPad; CPU OS 9_0_1 like Mac OS X) AppleWebKit/601.1.56 (KHTML, like Gecko) Version/9.0 Mobile/13A452 Sarafi/601.1.46
0   10  0   0   0   0
XXX.XXX.XXX.XX2 8   
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:21.0) Gecko/20100101 Firefox/21.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.29.13 (KHTML, like Gecko) Version/6.0.4 Safari/536.29.13
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/536.29.13 (KHTML, like Gecko) Version/6.0.4 Safari/536.29.13
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.52 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36
75  8   2   0   4   6
XXX.XXX.XXX.XX3 7   
MacOutlook/15.24.0.160709 (Intel Mac OS X Version 10.11.6 (Build 15G1108))
OC/4.0.7577.4500 (Microsoft Lync 2010)
MacOutlook/f.15.1.160411 (Intel Mac OS X Version 10.11.6 (Build 15G1004))
SHMessage/APP_VERSION_NUMBER (iPhone; iOS 10.2; Scale/2.00)
Mozilla/5.0 (iPhone; CPU iPhone OS 10_2 like Mac OS X) AppleWebKit/602.3.12 (KHTML, like Gecko) Mobile/14C89
Microsoft Office/16.0 (Windows NT 6.1; ucmapi 16.0.4456; Pro)
swcd (unknown version) CFNetwork/808.2.16 Darwin/15.6.0
0   7   0   0   0   0

What I am trying to do is get a list of src_ip's that are null in the "Whitelist.csv" file along with a list of their known useragents, which works.
What I want to do now is to list the useragents seen in the results from the most common to least.
So I can say, these are the most used useragents in instances where people are likely scraping.
I tried several options but couldnt get any to work, the problem is with list(useragent).

Any thoughts on how I could accomplish this?
Thanks for the help

0 Karma

mgrosholz
Path Finder

What is the time spent on the search? i.e. is it already a fairly long search to complete?
I was able to do something similar by utilizing a subsearch. But subsearches slow the search-time results.

You can try...

index=akamai src_ip!=xxx.xx.xx.xx AND src_ip!=xxx.xx.xx.xx [search index=akamai src_ip!=xxx.xx.xx.xx AND src_ip!=xxx.xx.xx.xx | lookup whitelistip.csv src_ip OUTPUTNEW src_ip as Whitelist |  where isnull(Whitelist) | eval useragent=urldecode(useragent) |  dedup useragent | top limit=x useragents | table http_user_agent, useragent] | stats dc(http_user_agent) AS Number_of_userAgents, list(useragent) as useragent_list, count(eval(uri_path="/search/stuff/v2")) as "SearchV2", count(eval(uri_path="shape/search/stuff/v2")) as "ShapeV2", count(eval(uri_path="/search/stuff/v1")) as "SearchV1", count as EventCount by src_ip |  eval TotalCount=SearchV2+ShapeV2+SearchV1 | eval hitPercentage=(TotalCount/EventCount) * 100  | fields src_ip, Number_of_userAgents,useragent_list, hitPercentage, EventCount, SearchV2, ShapeV2, SearchV1, TotalCount 

If you are unsure about your subsearch just put "| format" behind it and you will see what it pipes to your search.

0 Karma

somesoni2
Revered Legend

Try this

index=akamai src_ip!=xxx.xx.xx.xx AND src_ip!=xxx.xx.xx.xx | lookup whitelistip.csv src_ip OUTPUTNEW src_ip as Whitelist | where isnull(Whitelist) | eval useragent=urldecode(useragent) | dedup useragent | stats dc(http_user_agent) AS Number_of_userAgents, list(useragent) as useragent_list, count(eval(uri_path="/search/inventory/v2")) as "SearchV2", count(eval(uri_path="shape/search/inventory/v2")) as "ShapeV2", count(eval(uri_path="/search/inventory/v1")) as "SearchV1", count as EventCount by src_ip | eval TotalCount=SearchV2+ShapeV2+SearchV1 | eval hitPercentage=(TotalCount/EventCount) * 100 | fields src_ip, Number_of_userAgents,useragent_list, hitPercentage, EventCount, SearchV2, ShapeV2, SearchV1, TotalCount | mvexpand useragent_list | top useragent_list

OR

index=akamai src_ip!=xxx.xx.xx.xx AND src_ip!=xxx.xx.xx.xx | lookup whitelistip.csv src_ip OUTPUTNEW src_ip as Whitelist | where isnull(Whitelist) | eval useragent=urldecode(useragent) | dedup useragent | stats dc(http_user_agent) AS Number_of_userAgents, list(useragent) as useragent_list, count(eval(uri_path="/search/inventory/v2")) as "SearchV2", count(eval(uri_path="shape/search/inventory/v2")) as "ShapeV2", count(eval(uri_path="/search/inventory/v1")) as "SearchV1", count as EventCount by src_ip | eval TotalCount=SearchV2+ShapeV2+SearchV1 | eval hitPercentage=(TotalCount/EventCount) * 100 | fields src_ip, Number_of_userAgents,useragent_list, hitPercentage, EventCount, SearchV2, ShapeV2, SearchV1, TotalCount | stats count by useragent_list | sort -count
0 Karma

tkwaller
Builder

The first search returns results I am unsure about so I am checking that. The second is count of 1 for each useragent. I thought about multikv too but I didnt get that to work either.

0 Karma

somesoni2
Revered Legend

What is the final output that you want? It could be possible that we don't have to rely on the output of your stats but directly get the top useragents after dedup (or before).

0 Karma

tkwaller
Builder

Here is essentially what I am trying to do
If instances of a specific IP hitting one or more of those uri_paths is over, say, 20% of the total amount for that host then we include that host and all the useragents they use
and then put the useragents together all together and top that.

0 Karma

somesoni2
Revered Legend

Give this a try as well

index=akamai src_ip!=xxx.xx.xx.xx AND src_ip!=xxx.xx.xx.xx NOT[ | inputlookup whitelistip.csv  | table src_ip ]  | eval useragent=urldecode(useragent)  | stats count by useragent | sort -count
0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...