I'm looking for spiders, which I can identify by abusive rates using transactions. For example: SPLUNK_SEARCH='sourcetype="access_combined" startminutesago=5 | transaction fields=clientip maxspan=6m maxpause=1m | search linecount > 500'
This will identify spiders or abusive traffic based on a business rule. 500 could be more or less.
I would like a search with maybe linecount > 50 to find a list of IPs and then find out which IP has more than 20 or X different useragents. This would help identify spiders that are trying to fly under the radar with a smaller transaction count and switching their useragent each hit to look more legit.
A much better search would avoid the use of transaction
and instead do:
sourcetype=access_combined earliest=-5m | stats distinct_count(user_agent) as ip_agent_count by clientip | where ip_agent_count >= 20
Your first query is much better written as:
sourcetype=access_combined earliest=5m | stats count by clientip | where count > 500
In general, the stats
searches will scale about linearly with the number of indexers in your indexing cluster, while transaction
does not map-reduce as well and so will bottleneck on the search head.
A much better search would avoid the use of transaction
and instead do:
sourcetype=access_combined earliest=-5m | stats distinct_count(user_agent) as ip_agent_count by clientip | where ip_agent_count >= 20
Your first query is much better written as:
sourcetype=access_combined earliest=5m | stats count by clientip | where count > 500
In general, the stats
searches will scale about linearly with the number of indexers in your indexing cluster, while transaction
does not map-reduce as well and so will bottleneck on the search head.