Solved: How do I find values that exist in two different i...

bmoody3 · ‎06-13-2019

I'm essentially looking to compare my index field values against an index that has known-bad field values to determine if these bad field values exist in my environment - namely ip values. I have a functioning search, however, the limitations of the join command [50,000] result kind of takes away from the effectiveness of the search. It outputs the following... "Subsearch produced 50000 results, truncating to maxout 50000".
Which other options do I have? Writing the second field to a lookup table? Using a different approach to the SPL?
My example code is below:

index=foo source=oof 
| fields blah, blah1, blah2
| rex field=_raw "(?\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
| join ip
    [ search index=doo sourcetype="aaf" NOT ip="NULL"
    | dedup ip
    | fields ip] 
| table host ip

Unless I limit by time I'll hit the 50k cap using this, is there another approach anyone is aware of? Ideally this should output bad ip addresses that have been fond in my data.

jazzypai · ‎06-13-2019

Is index foo is your data good ips?
Is index doo is your known bad ips?
Are you regexing your good ips?

If you combine both sets of data with the OR statement then you'll have all data in one searchable bucket. Then dedup by index and ip so that you have an ip from each index to compare. Then compare those two by counting by ip (if each index contains an ip then the count will be 2).

(index=foo source=foo) OR (index=doo sourcetype="aaf" NOT ip="NULL)
| rex field=_raw "(?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
| dedup index ip
| stats count by ip
| where count > 1

Each result in this search means that it matches your good data (your indexed data) and is also on the known bad list.

View solution in original post

jazzypai · ‎06-13-2019

Is index foo is your data good ips?
Is index doo is your known bad ips?
Are you regexing your good ips?

If you combine both sets of data with the OR statement then you'll have all data in one searchable bucket. Then dedup by index and ip so that you have an ip from each index to compare. Then compare those two by counting by ip (if each index contains an ip then the count will be 2).

(index=foo source=foo) OR (index=doo sourcetype="aaf" NOT ip="NULL)
| rex field=_raw "(?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
| dedup index ip
| stats count by ip
| where count > 1

Each result in this search means that it matches your good data (your indexed data) and is also on the known bad list.

bmoody3 · ‎06-13-2019

Yes, the foo index contains good ips and the doo index contains bad ips. The regex on good ips pulls out all ip addresses from the index. The index with bad ips already has ips that are formatted and wouldn't need regex. Would this still work?

jazzypai · ‎06-13-2019

There are many ways to do a lot of things in Splunk. This should work but that rex will be attempting to extract from all events, including both indexes. If you can get ips extracted through props/transforms then that would be much better. Give it a shot and provide feedback.

bmoody3 · ‎06-13-2019

This SPL works. Thanks jazzy!

Vijeta · ‎06-13-2019

Try this

index=foo source=oof  OR ( index=doo sourcetype="aaf" NOT ip="NULL")
| dedup ip | rex field=_raw "(?<ip1>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})" | eval ip=coalesce(ip1,ip)| stats values(host) as host , values(blah) as blah, values(blah1) as blah1 by ip

bmoody3 · ‎06-13-2019

Where does the actual comparison take place? I want to see if any values inside index2 exist in index1.

bmoody3 · ‎06-13-2019

Could you explain the purpose of the first "OR"?

Vijeta · ‎06-13-2019

@bmoody3 - for getting values that reside in index2 based on index 1 you could use eventstats .If the count of foo>0 and doo>0 from below result, then the ip is present in both the index

 index=foo source=oof  OR ( index=doo sourcetype="aaf" NOT ip="NULL")
     | dedup ip | rex field=_raw "(?<ip1>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})" | eval ip=coalesce(ip1,ip)| eventstats count(eval(index=foo)) as foo, count eval(index=doo) as doo by ip| where foo>0 and doo>0 | table <your fields>

How do I find values that exist in two different indexes for a particular field?

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes