So, I have two indexes and sourcetypes with the following fields:
index1 and sourcetype1:
aip = 34.465.45.234
AppVendor = vendor1, vendor2, vendor3 (These are all from different events)
AppName = app2, app3, app1 (All from different events corresponding to position of the vendors above)
AppVersion = 3.0343, 1.354, 2.5465 (Same convention)
index2 and sourcetype2:
jsonevent.external_ip = 34.465.45.234
jsonevent.hostname = Host1
jsonevent.Username = User1
I use the following search:
(index=index1 sourcetype=sourcetype1) OR (index=index2 sourcetype=sourcetype2)
| rename jsonevent.external_ip as exip
| rename aip as agentip
| eval external_ip = coalesce(agentip, exip)
| stats values(jsonevent.hostname) as Hostnames, values(jsonevent.Username) as Users, values(AppVendor) as Vendors, values(AppName) as Applications, values(AppVersion) as Version by external_ip
| search Hostnames=* Users=* Vendors=* Applications=* Version=*
I get the following:
external_ip Hostnames Usernames Vendors Applications Version
34.465.45.234 Host1 User1 Vendor1 app1 1.354
Vendor2 app2 2.5465
Vendor3 app3 3.0343
What I want is the following:
external_ip Hostnames Usernames Vendors Applications Version
34.465.45.234 Host1 User1 Vendor1 app2 3.0343
34.465.45.234 Host1 User1 Vendor2 app3 1.354
34.465.45.234 Host1 User1 Vendor3 app1 2.5465
Does anyone have any ideas how to achieve this?
So, @ITWhisperer and @richgalloway. I combined both offered solutions into the following which did end up working:
(index=index1 sourcetype=sourcetype1) OR (index=index2 sourcetype=sourcetype2)
| rename jsonevent.external_ip as exip
| rename aip as agentip
| eval external_ip = coalesce(agentip, exip)
| eventstats list(jsonevent.hostname) as Hostnames, list(jsonevent.Username) as Users by external_ip
| eval hostuser = mvzip(Hostnames, Users)
| mvexpand hostuser
| eval HostUser = split(hostuser, ",")
| eval Hostnames=mvindex(HostUser, 0), Users=mvindex(HostUser, 1)
| rename AppVendor as Vendors, AppName as Applications, AppVersion as Version
| where isnotnull(Vendors)
| search Hostnames=*, Users=*
| table external_ip, Hostnames, Users, Vendors, Applications, Version
This gave me the result I wanted, but I also ended up with a caution that the list command reached a limit of 100 and had to truncate some events, and the search as a whole slows to a crawl. Is the list command that resource intensive?
Since your sample data doesn't show more than one host or user for each ip address, guessing that list would be useful to get what you needed was beyond my knowledge. Perhaps you could provide a more representative example of the events you are dealing with next time so we might be able to suggest a suitable solution (hopefully avoiding memory issues if possible).
Yes, the list function is resource-intensive. It makes a list of every value for the specified field so that takes time and memory.
The mvexpand command will split the multi-value fields into separate events. The problem is doing so breaks the relationships with other multi-value fields. To work around that, combine the three multi-value fields into a single multi-value field, use mvexpand, then split the fields apart.
| rename jsonevent.external_ip as exip
| rename aip as agentip
| eval external_ip = coalesce(agentip, exip)
| stats values(jsonevent.hostname) as Hostnames, values(jsonevent.Username) as Users, values(AppVendor) as Vendors, values(AppName) as Applications, values(AppVersion) as Version by external_ip
| eval tuple=mvzip(Hostnames, mvzip(Usernames, mvzip(Vendors, mvzip(Applications, Version))))
| mvexpand tuple
| eval tuple=split(tuple, ",")
| eval Hostnames=mvindex(tuple, 0), Usernames=mvindex(tuple, 1), Vendors=mvindex(tuple, 2), Applications=mvindex(tuple, 3),Version=mvindex(tuple, 4)
The problem with using values() is that the multivalue fields are each sorted lexicographically independently and the original relationship between the values from the events is lost before the mvzip/mvexpand/mvindex fixup.
Excellent point. My answer should use list rather than values.
Try something like this
(index=index1 sourcetype=sourcetype1) OR (index=index2 sourcetype=sourcetype2)
| rename jsonevent.external_ip as exip
| rename aip as agentip
| eval external_ip = coalesce(agentip, exip)
| eventstats values(jsonevent.hostname) as Hostnames, values(jsonevent.Username) as Usernames by external_ip
| rename AppVendor as Vendors, AppName as Applications, AppVersion as Version
| where isnotnull(Vendors)
| table external_ip Hostnames Usernames Vendors Applications Version
I used this search and it did work, however, something that I probably should have mentioned earlier is that multiple hosts and users are linked to the same external ip, so I am now getting multivalue fields for the Hostnames and Users. Anything that can be done for that? Or should I combine the two fields beforehand, then split them after the eventstats command?