Hello fellow Splunkers!
I'm trying to recreate an existing report for my firewall guy within Splunk with hopes of replacing the system he's currently using. I'm a bit stuck though as he wants to break down traffic and get counts on 3 levels based on top talkers so we have a total per SourceIP, a total per DestinationIP hitting each SourceIP, and finally a total per Status hitting each SourceIP/DestinationIP combo. Something like so:
SourceIP | DestinationIP | HTTPStatus | NumRequests | ||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||
|
The log data is pretty straightforward. Each event has data in the 3 fields SourceIP, DestinationIP, and HTTPStatus, and NumRequests is just counting them. Something like so:
[2014-08-08 04:17:22.000 -0700] Type="WebRequest" SourceIP="1.2.3.4" DestinationIP="2.3.4.5" HTTPStatus="200" Bytes="12341" SourcePort="65432" DestinationPort="80"
I can get the raw data so each SourceIP/DestinationIP/HTTPStatus is totalled up fairly easily with
| stats count by SourceIP, DestinationIP, HTTPStatus
but getting beyond that to human readable with totals at each level is eluding me.
I have been able to group on two levels (one parent, and then count by each child "category") using stats and appendpipe but can't seem to find or figure out a method for doing this on multiple levels. I also tried to get there in Pivot with no further luck, but I have almost no experience with the Pivot interface so I may just be missing something. The data doesn't have to be laid out exactly like above, just so long as it totals up each level and provides a reasonably human readable format.
Anyone out there have any idea how this might be accomplished?
Thanks!
Ah, this is the perfect use case for this (still) undocumented command, multireport
:
sourcetype=myhttpdata
| multireport
[ stats count by SourceIP ]
[ stats count by SourceIP DestinationIP ]
[ stats count by SourceIP DestinationIP HTTPStatus ]
This gives you the data you need.
Now, you need to sort the rows into the order you'd like better. This kind of works:
sourcetype=myhttpdata
| multireport
[ stats count by SourceIP ]
[ stats count by SourceIP DestinationIP ]
[ stats count by SourceIP DestinationIP HTTPStatus ]
| sort SourceIP DestinationIP HTTPStatus
But that puts the summary rows for SourceIP and Destination IP at the end of their categories. To move them to the beginning, you need to do something hacky:
sourcetype=myhttpdata
| multireport
[ stats count by SourceIP ]
[ stats count by SourceIP DestinationIP ]
[ stats count by SourceIP DestinationIP HTTPStatus ]
| fillnull value="_ANY_" SourceIP DestinationIP HTTPStatus
| sort SourceIP DestinationIP HTTPStatus
And then if you want you can do an "eval" to convert the "_ANY" back to an empty string or null. But that gives you things in lexicographic order. What if you want them in order of most common SourceIP, then most common DestinationIP? Then you'll have to do:
sourcetype=myhttpdata
| multireport
[ stats count as by SourceIP | eval count_sip=count ]
[ stats count as by SourceIP DestinationIP | eval count_sip_dip=count ]
[ stats count as by SourceIP DestinationIP HTTPStatus ]
| fillnull value="_ANY_" SourceIP DestinationIP HTTPStatus
| eventstats first(eval(coalesce(count_sip,null()))) as count_sip by SourceIP
| eventstats first(eval(coalesce(count_sip_dip,null()))) as count_sip_dip by SourceIP DestinationIP
| sort -count_sip +SourceIP -count_sip_dip +DestinationIP -count
| fields - count_sip count_sip_dip
Now, this gives similar results (but not the same, it lacks summary rows and instead uses different columns for the summary values):
sourcetype=myhttpdata
| eventstats count as count_sip by SourceIP
| eventstats count as count_sip_dip by SourceIP DestinationIP
| stats count first(count_sip) as count_sip
first(count_sip_dip) as count_sip_dip
by SourceIP DestinationIP HTTPStatus
| sort -count_sip +SourceIP -count_sip_dip +DestinationIP -count
but likely runs a lot slower. You could also use:
sourcetype=myhttpdata
| stats count by SourceIP DestinationIP HTTPStatus
| eventstats sum(count) as count_sip_dip by SourceIP DestinationIP
| eventstats sum(count_sip_dip) as count_sip by SourceIP
| sort -count_sip +SourceIP -count_sip_dip +DestinationIP -count
which will be a lot better, but for large data sets will still be slower than the version using multireport.
Ah, this is the perfect use case for this (still) undocumented command, multireport
:
sourcetype=myhttpdata
| multireport
[ stats count by SourceIP ]
[ stats count by SourceIP DestinationIP ]
[ stats count by SourceIP DestinationIP HTTPStatus ]
This gives you the data you need.
Now, you need to sort the rows into the order you'd like better. This kind of works:
sourcetype=myhttpdata
| multireport
[ stats count by SourceIP ]
[ stats count by SourceIP DestinationIP ]
[ stats count by SourceIP DestinationIP HTTPStatus ]
| sort SourceIP DestinationIP HTTPStatus
But that puts the summary rows for SourceIP and Destination IP at the end of their categories. To move them to the beginning, you need to do something hacky:
sourcetype=myhttpdata
| multireport
[ stats count by SourceIP ]
[ stats count by SourceIP DestinationIP ]
[ stats count by SourceIP DestinationIP HTTPStatus ]
| fillnull value="_ANY_" SourceIP DestinationIP HTTPStatus
| sort SourceIP DestinationIP HTTPStatus
And then if you want you can do an "eval" to convert the "_ANY" back to an empty string or null. But that gives you things in lexicographic order. What if you want them in order of most common SourceIP, then most common DestinationIP? Then you'll have to do:
sourcetype=myhttpdata
| multireport
[ stats count as by SourceIP | eval count_sip=count ]
[ stats count as by SourceIP DestinationIP | eval count_sip_dip=count ]
[ stats count as by SourceIP DestinationIP HTTPStatus ]
| fillnull value="_ANY_" SourceIP DestinationIP HTTPStatus
| eventstats first(eval(coalesce(count_sip,null()))) as count_sip by SourceIP
| eventstats first(eval(coalesce(count_sip_dip,null()))) as count_sip_dip by SourceIP DestinationIP
| sort -count_sip +SourceIP -count_sip_dip +DestinationIP -count
| fields - count_sip count_sip_dip
Now, this gives similar results (but not the same, it lacks summary rows and instead uses different columns for the summary values):
sourcetype=myhttpdata
| eventstats count as count_sip by SourceIP
| eventstats count as count_sip_dip by SourceIP DestinationIP
| stats count first(count_sip) as count_sip
first(count_sip_dip) as count_sip_dip
by SourceIP DestinationIP HTTPStatus
| sort -count_sip +SourceIP -count_sip_dip +DestinationIP -count
but likely runs a lot slower. You could also use:
sourcetype=myhttpdata
| stats count by SourceIP DestinationIP HTTPStatus
| eventstats sum(count) as count_sip_dip by SourceIP DestinationIP
| eventstats sum(count_sip_dip) as count_sip by SourceIP
| sort -count_sip +SourceIP -count_sip_dip +DestinationIP -count
which will be a lot better, but for large data sets will still be slower than the version using multireport.
several usable methods there, thanks gkanapathy! The last one seems to come out the cleanest, I think we'll stick with that. Any long searches can be scheduled overnight so not a big deal on the speed front.
actually the last method is likely to be faster than using multireport. Multireport doesn't attempt to ask the indexers to pre-compute intermediate aggregates for the reports, which will make a big difference for data that is distributed across many indexers.
another method for doing this is by appending an "ALL" value to each DestinationIP and HTTPStatus. E.g.
sourcetype=myhttpdata | eval DestinationIP = mvappend(DestinationIP,"ALL") | eval HTTPStatus=mvappend(HTTPStatus,"ALL") | stats count by SourceIP DestinationIP HTTPStatus
This has the additional benefit that if for some reason there are events that have a SourceIP but no DestinationIP or HTTPStatus, it will still get counted in the total for that SourceIP
Not the most elegant. But it does the trick.
I know you'll be able to tweak it to bend it to your will.
Or perhaps this will now inspire one of the smarter Splunkers to show me up! 😉
|stats count by SourceIP DestinationIP HTTPStatus
|stats sum(count) as DestCount list(HTTPStatus) as HTTPStatus list(count) as StatusCount by SourceIP DestinationIP|sort -SourceIP
|appendpipe [stats sum(DestCount) as Total_\For_\SourceIP by SourceIP]
|addtotals row=t col=t labelfield=" " fieldname=DestIPCount DestCount StatusCount label=NumRequests
|fields - DestCount
I tried for sometime. I could get the output in the format you need but only for a single SourceIP. The moment i included one more SourceIP to the dataset it failed.
By some other approach, the closest that i could get was this
Some Search terms ...| stats count by SourceIP DestinationIP HTTPStatus | stats list(HTTPStatus) as HTTPStatus list(count) as count by SourceIP DestinationIP | sort SourceIP