Splunk Search

How do I find values that exist in two different indexes for a particular field?

bmoody3
New Member

I'm essentially looking to compare my index field values against an index that has known-bad field values to determine if these bad field values exist in my environment - namely ip values. I have a functioning search, however, the limitations of the join command [50,000] result kind of takes away from the effectiveness of the search. It outputs the following... "Subsearch produced 50000 results, truncating to maxout 50000".
Which other options do I have? Writing the second field to a lookup table? Using a different approach to the SPL?
My example code is below:

index=foo source=oof 
| fields blah, blah1, blah2
| rex field=_raw "(?\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
| join ip
    [ search index=doo sourcetype="aaf" NOT ip="NULL"
    | dedup ip
    | fields ip] 
| table host ip

Unless I limit by time I'll hit the 50k cap using this, is there another approach anyone is aware of? Ideally this should output bad ip addresses that have been fond in my data.

0 Karma
1 Solution

jazzypai
Path Finder

Is index foo is your data good ips?
Is index doo is your known bad ips?
Are you regexing your good ips?

If you combine both sets of data with the OR statement then you'll have all data in one searchable bucket. Then dedup by index and ip so that you have an ip from each index to compare. Then compare those two by counting by ip (if each index contains an ip then the count will be 2).

(index=foo source=foo) OR (index=doo sourcetype="aaf" NOT ip="NULL)
| rex field=_raw "(?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
| dedup index ip
| stats count by ip
| where count > 1

Each result in this search means that it matches your good data (your indexed data) and is also on the known bad list.

View solution in original post

0 Karma

jazzypai
Path Finder

Is index foo is your data good ips?
Is index doo is your known bad ips?
Are you regexing your good ips?

If you combine both sets of data with the OR statement then you'll have all data in one searchable bucket. Then dedup by index and ip so that you have an ip from each index to compare. Then compare those two by counting by ip (if each index contains an ip then the count will be 2).

(index=foo source=foo) OR (index=doo sourcetype="aaf" NOT ip="NULL)
| rex field=_raw "(?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
| dedup index ip
| stats count by ip
| where count > 1

Each result in this search means that it matches your good data (your indexed data) and is also on the known bad list.

0 Karma

bmoody3
New Member

Yes, the foo index contains good ips and the doo index contains bad ips. The regex on good ips pulls out all ip addresses from the index. The index with bad ips already has ips that are formatted and wouldn't need regex. Would this still work?

0 Karma

jazzypai
Path Finder

There are many ways to do a lot of things in Splunk. This should work but that rex will be attempting to extract from all events, including both indexes. If you can get ips extracted through props/transforms then that would be much better. Give it a shot and provide feedback.

0 Karma

bmoody3
New Member

This SPL works. Thanks jazzy!

0 Karma

Vijeta
Influencer

Try this

index=foo source=oof  OR ( index=doo sourcetype="aaf" NOT ip="NULL")
| dedup ip | rex field=_raw "(?<ip1>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})" | eval ip=coalesce(ip1,ip)| stats values(host) as host , values(blah) as blah, values(blah1) as blah1 by ip
0 Karma

bmoody3
New Member

Where does the actual comparison take place? I want to see if any values inside index2 exist in index1.

0 Karma

bmoody3
New Member

Could you explain the purpose of the first "OR"?

0 Karma

Vijeta
Influencer

@bmoody3 - for getting values that reside in index2 based on index 1 you could use eventstats .If the count of foo>0 and doo>0 from below result, then the ip is present in both the index

 index=foo source=oof  OR ( index=doo sourcetype="aaf" NOT ip="NULL")
     | dedup ip | rex field=_raw "(?<ip1>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})" | eval ip=coalesce(ip1,ip)| eventstats count(eval(index=foo)) as foo, count eval(index=doo) as doo by ip| where foo>0 and doo>0 | table <your fields>
0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...