I have 3 indexes containing events with IP addresses, index1, index2, and index3. My goal is to return a list of all IP addresses that are present in index1, but are not present in index2 or index3.
My current solution finds the IPs that are only in either index1 or (index2 or index3), using set diff, then intersects that result with index1 to limit the IPs to ones in index1:
| set intersect [ search index=index1 AND ip earliest=-3d | dedup 1 ip | table ip ] [ | set diff [ search index=index1 AND ip earliest=-3d | dedup 1 ip | table ip ] [ search (index=index2 OR index=index3) AND IpAddr earliest=-5d | dedup 1 IpAddr | rename IpAddr AS ip | table ip ] ]
This solution works, except each subsearch returns more than the maximum subsearch limit of 10500 (around 20000 results).
Does anyone have any idea how this search could be rewritten without using subsearches, to avoid this limitation?
Thanks for any help or ideas.
The final solution is:
((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d | dedup ip keepempty=true | dedup IpAddr keepempty=true | eval myIP=coalesce(ip,IpAddr) | eval from_index1=if(index="index1",1,null()) | stats dc(index) AS num_occurences sum(from_index1) AS from_index1 by myIP | where num_occurences=1 AND from_index1=1 | table myIP
Thanks @mus for your help making this more efficient.
The first solution took up large amounts of memory (>500MB):
((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d | dedup IpAddr keepempty=true | stats values(ip) AS ips values(IpAddr) AS ips2 | mvexpand ips | eval result=if(match(ips2, ips),"YES","NO") | where result="NO" | table ips
Thanks @ppablo for the helpful link and the quick response.
The final solution is:
((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d | dedup ip keepempty=true | dedup IpAddr keepempty=true | eval myIP=coalesce(ip,IpAddr) | eval from_index1=if(index="index1",1,null()) | stats dc(index) AS num_occurences sum(from_index1) AS from_index1 by myIP | where num_occurences=1 AND from_index1=1 | table myIP
Thanks @mus for your help making this more efficient.
The first solution took up large amounts of memory (>500MB):
((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d | dedup IpAddr keepempty=true | stats values(ip) AS ips values(IpAddr) AS ips2 | mvexpand ips | eval result=if(match(ips2, ips),"YES","NO") | where result="NO" | table ips
Thanks @ppablo for the helpful link and the quick response.
This search uses a significant amount of memory (>500MB). It appears that the output from values(IpAddr) AS ips2
is duplicated in memory for each values(ip) AS ips
when mvexpand ips
is executed. Is there a way to force Splunk to store only 1 copy of values(IpAddr)
in memory?
Yes, mvexpand
is the down side in this example ... Maybe a different approach could help here; How about counting the number of times an IP appears in the indexes and compare it that way?
((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d
| dedup IpAddr keepempty=true
| eval myIP = coalesce(ip,IpAddr)
| stats dc(index) AS c_idx by myIP
| where c_idx = 1
This should show only IP's that are in one index.
cheers, MuS
I really like that idea; it seems really clean and simple. But I need the result to be IPs that are only in index1, with no IPs that are only in index2 or index3. Any thoughts how I could achieve this? Maybe a function similar to coalesce that could additionally assign a label, or maybe through an eval of some sort?
eval
is your friend 😉
| eval c_idx1=if(index="index1", 1, null())
and change the stats
and the where
to be
| stats dc(idx) AS c_idx sum(c_idx1) AS c_idx1 by myIP
| where c_idx=1 AND c_idx1=1
cheers, MuS
Thanks a lot for your help. I learned a bunch. I'll update the answer when I get the final solution finished.
Awesome, I'm glad the link helped you come up with an efficient search 🙂 Thanks for sharing your final solution with the community, and welcome to Splunk Answers!
Cheers
Patrick
Always good to see that my Q&A is helpful 🙂
Hi @jsilverstein
If you haven't seen this yet, this previous Q&A by @mus might give you some ideas on how to approach this:
https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo...