Hi, I want to find the average number of different IP's over a given time and list all IP's above that count
so if 10 different IP's are listed twice each and another 3 IP's are listed 100 times
the median average is 2 - and i want to list the 3 IP's that are greater than this
I can get the average by
index=uk sourcetype=access-log earliest=-10m | top ip limit=0 | stats median(count) as avg-ip
I can list all IPs with a count greater than a fixed value
index=uk sourcetype=access-log earliest=-10m | top ip limit=0 | search count > 150
But i am struggling doing a sub search to combine the two
This for example returns nothing
[search index=uk sourcetype=access-log earliest=-10m | top ip limit=0 | stats median(count) as avg-ip | fields avg-ip] index=uk sourcetype=access-log earliest=-10m | top ip limit=0 | search count > avg-ip
As is often the case, it feels like a job for a subsearch, but it's far better to handle this with stats and it's strange cousins eventstats and streamstats. Here's your answer I think.
<your search> | eventstats count as thisIpCount by ip | eventstats median(count) as medianIpCount | where thisIpCount>medianIpCount
The first eventstats goes through all the incoming events and paints a little "thisIpCount" on each event (at least, on each event that has a value for "ip"). The thisIpCount field will be 3 if that IP occurs exactly three times in the whole set of events.
The next eventstats goes over the entire set of events again and calculates the overall median value of the "thisIpCount" field and paints that onto every row as a new field called "medianIpCount".
Lastly we just use a where clause to filter the set so we're left with only the events where the ip in the event appeared more than the median.
Note that the search as written above will still return the raw events. If you instead want to roll up this new filtered set of IP's, you would add another stats clause on the end and do something like this.
<your search> | eventstats count as thisIpCount by ip | eventstats median(count) as medianIpCount | where thisIpCount>medianIpCount | top ip
OR if you want the resulting table to keep a record of what that overall median was, here's another example:
<your search> | eventstats count as thisIpCount by ip | eventstats median(count) as medianIpCount | where thisIpCount>medianIpCount | stats count last(medianIpCount) as medianIpCount by ip | sort - count
As is often the case, it feels like a job for a subsearch, but it's far better to handle this with stats and it's strange cousins eventstats and streamstats. Here's your answer I think.
<your search> | eventstats count as thisIpCount by ip | eventstats median(count) as medianIpCount | where thisIpCount>medianIpCount
The first eventstats goes through all the incoming events and paints a little "thisIpCount" on each event (at least, on each event that has a value for "ip"). The thisIpCount field will be 3 if that IP occurs exactly three times in the whole set of events.
The next eventstats goes over the entire set of events again and calculates the overall median value of the "thisIpCount" field and paints that onto every row as a new field called "medianIpCount".
Lastly we just use a where clause to filter the set so we're left with only the events where the ip in the event appeared more than the median.
Note that the search as written above will still return the raw events. If you instead want to roll up this new filtered set of IP's, you would add another stats clause on the end and do something like this.
<your search> | eventstats count as thisIpCount by ip | eventstats median(count) as medianIpCount | where thisIpCount>medianIpCount | top ip
OR if you want the resulting table to keep a record of what that overall median was, here's another example:
<your search> | eventstats count as thisIpCount by ip | eventstats median(count) as medianIpCount | where thisIpCount>medianIpCount | stats count last(medianIpCount) as medianIpCount by ip | sort - count