Heres my current search:
index=akamai src_ip!=xxx.xx.xx.xx AND src_ip!=xxx.xx.xx.xx | lookup whitelistip.csv src_ip OUTPUTNEW src_ip as Whitelist | where isnull(Whitelist) | eval useragent=urldecode(useragent) | dedup useragent | stats dc(http_user_agent) AS Number_of_userAgents, list(useragent) as useragent_list, count(eval(uri_path="/search/stuff/v2")) as "SearchV2", count(eval(uri_path="shape/search/stuff/v2")) as "ShapeV2", count(eval(uri_path="/search/stuff/v1")) as "SearchV1", count as EventCount by src_ip | eval TotalCount=SearchV2+ShapeV2+SearchV1 | eval hitPercentage=(TotalCount/EventCount) * 100 | fields src_ip, Number_of_userAgents,useragent_list, hitPercentage, EventCount, SearchV2, ShapeV2, SearchV1, TotalCount
This search give me reults such as:
src_ip Number_of_userAgents useragent_list hitPercentage EventCount SearchV2 ShapeV2 SearchV1 TotalCount
XXX.XXX.XXX.XX1 10
Mozilla/5.0 (iPad; CPU OS 9_1 like Mac OS X) AppleWebKit/601.1.56 (KHTML, like Gecko) Version/9.0 Mobile/13A452 Sarafi/601.1.4
Mozilla/5.0 (iPad; CPU OS 9_0_2 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/9.0 Mobile/13A404 Sarafi/601.1.46
Mozilla/5.0 (iPad; CPU OS 9_1 like Mac OS X) AppleWebKit/601.1.56 (KHTML, like Gecko) Version/9.0 Mobile/13A404 Sarafi/601.1.56
Mozilla/5.0 (iPad; CPU OS 9_0_2 like Mac OS X) AppleWebKit/601.1.17 (KHTML, like Gecko) Version/9.0 Mobile/13A452 Sarafi/601.1.46
Mozilla/5.0 (iPad; CPU OS 9_0 like Mac OS X) AppleWebKit/601.1.39 (KHTML, like Gecko) Version/9.0 Mobile/13D15 Sarafi/601.1
Mozilla/5.0 (iPad; CPU OS 9_0 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Sarafi/600.1.4
Mozilla/5.0 (iPad; CPU OS 9_3 like Mac OS X) AppleWebKit/601.1.17 (KHTML, like Gecko) Version/9.0 Mobile/13C75 Sarafi/601.1.46
Mozilla/5.0 (iPad; CPU OS 9_0 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13E238 Sarafi/601.1.56
Mozilla/5.0 (iPad; CPU OS 9_0_2 like Mac OS X) AppleWebKit/601.1.17 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Sarafi/601.1
Mozilla/5.0 (iPad; CPU OS 9_0_1 like Mac OS X) AppleWebKit/601.1.56 (KHTML, like Gecko) Version/9.0 Mobile/13A452 Sarafi/601.1.46
0 10 0 0 0 0
XXX.XXX.XXX.XX2 8
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:21.0) Gecko/20100101 Firefox/21.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.29.13 (KHTML, like Gecko) Version/6.0.4 Safari/536.29.13
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/536.29.13 (KHTML, like Gecko) Version/6.0.4 Safari/536.29.13
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.52 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36
75 8 2 0 4 6
XXX.XXX.XXX.XX3 7
MacOutlook/15.24.0.160709 (Intel Mac OS X Version 10.11.6 (Build 15G1108))
OC/4.0.7577.4500 (Microsoft Lync 2010)
MacOutlook/f.15.1.160411 (Intel Mac OS X Version 10.11.6 (Build 15G1004))
SHMessage/APP_VERSION_NUMBER (iPhone; iOS 10.2; Scale/2.00)
Mozilla/5.0 (iPhone; CPU iPhone OS 10_2 like Mac OS X) AppleWebKit/602.3.12 (KHTML, like Gecko) Mobile/14C89
Microsoft Office/16.0 (Windows NT 6.1; ucmapi 16.0.4456; Pro)
swcd (unknown version) CFNetwork/808.2.16 Darwin/15.6.0
0 7 0 0 0 0
What I am trying to do is get a list of src_ip's that are null in the "Whitelist.csv" file along with a list of their known useragents, which works.
What I want to do now is to list the useragents seen in the results from the most common to least.
So I can say, these are the most used useragents in instances where people are likely scraping.
I tried several options but couldnt get any to work, the problem is with list(useragent).
Any thoughts on how I could accomplish this?
Thanks for the help
What is the time spent on the search? i.e. is it already a fairly long search to complete?
I was able to do something similar by utilizing a subsearch. But subsearches slow the search-time results.
You can try...
index=akamai src_ip!=xxx.xx.xx.xx AND src_ip!=xxx.xx.xx.xx [search index=akamai src_ip!=xxx.xx.xx.xx AND src_ip!=xxx.xx.xx.xx | lookup whitelistip.csv src_ip OUTPUTNEW src_ip as Whitelist | where isnull(Whitelist) | eval useragent=urldecode(useragent) | dedup useragent | top limit=x useragents | table http_user_agent, useragent] | stats dc(http_user_agent) AS Number_of_userAgents, list(useragent) as useragent_list, count(eval(uri_path="/search/stuff/v2")) as "SearchV2", count(eval(uri_path="shape/search/stuff/v2")) as "ShapeV2", count(eval(uri_path="/search/stuff/v1")) as "SearchV1", count as EventCount by src_ip | eval TotalCount=SearchV2+ShapeV2+SearchV1 | eval hitPercentage=(TotalCount/EventCount) * 100 | fields src_ip, Number_of_userAgents,useragent_list, hitPercentage, EventCount, SearchV2, ShapeV2, SearchV1, TotalCount
If you are unsure about your subsearch just put "| format" behind it and you will see what it pipes to your search.
Try this
index=akamai src_ip!=xxx.xx.xx.xx AND src_ip!=xxx.xx.xx.xx | lookup whitelistip.csv src_ip OUTPUTNEW src_ip as Whitelist | where isnull(Whitelist) | eval useragent=urldecode(useragent) | dedup useragent | stats dc(http_user_agent) AS Number_of_userAgents, list(useragent) as useragent_list, count(eval(uri_path="/search/inventory/v2")) as "SearchV2", count(eval(uri_path="shape/search/inventory/v2")) as "ShapeV2", count(eval(uri_path="/search/inventory/v1")) as "SearchV1", count as EventCount by src_ip | eval TotalCount=SearchV2+ShapeV2+SearchV1 | eval hitPercentage=(TotalCount/EventCount) * 100 | fields src_ip, Number_of_userAgents,useragent_list, hitPercentage, EventCount, SearchV2, ShapeV2, SearchV1, TotalCount | mvexpand useragent_list | top useragent_list
OR
index=akamai src_ip!=xxx.xx.xx.xx AND src_ip!=xxx.xx.xx.xx | lookup whitelistip.csv src_ip OUTPUTNEW src_ip as Whitelist | where isnull(Whitelist) | eval useragent=urldecode(useragent) | dedup useragent | stats dc(http_user_agent) AS Number_of_userAgents, list(useragent) as useragent_list, count(eval(uri_path="/search/inventory/v2")) as "SearchV2", count(eval(uri_path="shape/search/inventory/v2")) as "ShapeV2", count(eval(uri_path="/search/inventory/v1")) as "SearchV1", count as EventCount by src_ip | eval TotalCount=SearchV2+ShapeV2+SearchV1 | eval hitPercentage=(TotalCount/EventCount) * 100 | fields src_ip, Number_of_userAgents,useragent_list, hitPercentage, EventCount, SearchV2, ShapeV2, SearchV1, TotalCount | stats count by useragent_list | sort -count
The first search returns results I am unsure about so I am checking that. The second is count of 1 for each useragent. I thought about multikv too but I didnt get that to work either.
What is the final output that you want? It could be possible that we don't have to rely on the output of your stats but directly get the top useragents after dedup (or before).
Here is essentially what I am trying to do
If instances of a specific IP hitting one or more of those uri_paths is over, say, 20% of the total amount for that host then we include that host and all the useragents they use
and then put the useragents together all together and top that.
Give this a try as well
index=akamai src_ip!=xxx.xx.xx.xx AND src_ip!=xxx.xx.xx.xx NOT[ | inputlookup whitelistip.csv | table src_ip ] | eval useragent=urldecode(useragent) | stats count by useragent | sort -count