Splunk Search

How to search multiple indexes and join field values that don't exactly match?

StormTrooper
New Member

Hi,

I need to search in multiple indexes but the field values won't match exactly so a straight join will not produce results.

index=proxy Url="" | join [search index=watchlist "".domain."*"]

This is the code I am using and while syntax is ok I don't know if it is doing what I want. The proxy index has a full URL while the watchlist only has the top level of the domain i.e. www.splunk.com

Any help appreciated.

Tags (3)
0 Karma
1 Solution

yannK
Splunk Employee
Splunk Employee

if you want to join events per domain, you need to extract the domain in a field for both type of events.
By example with a rex command. Then join the 2 set of results on this new field.

index=proxy | rex field=url "http(s|)://(?[-_\w\d\.]*)"
| join shortdomain [
search index=watchlist | rex field=domain "http(s|)://(?[-_\w\d\.]*)" ]

please adapt to your actual fields formats.

View solution in original post

0 Karma

yannK
Splunk Employee
Splunk Employee

if you want to join events per domain, you need to extract the domain in a field for both type of events.
By example with a rex command. Then join the 2 set of results on this new field.

index=proxy | rex field=url "http(s|)://(?[-_\w\d\.]*)"
| join shortdomain [
search index=watchlist | rex field=domain "http(s|)://(?[-_\w\d\.]*)" ]

please adapt to your actual fields formats.

0 Karma

StormTrooper
New Member

Thank you for this, I got it working as I wanted.

P.S. sorry for the delay in replying I haven't had a chance to look at this for a while.

0 Karma

yannK
Splunk Employee
Splunk Employee

The regex command was reformatted by the website,

it should have a tag after the question mark, I added it back in the example below,
please remove the "underscore" to fix it _shortdomain_ to shortdomain

index=proxy | rex field=url "http(s|)://(?<_shortdomain_>[-_\w\d\.]*)" | join shortdomain [ search index=watchlist | rex field=domain "http(s|)://(?<_shortdomain_>[-_\w\d\.]*)" ]

StormTrooper
New Member

Thank you for these answers. @ yannK I tried your code but got the following error

Error in 'rex' command: Encountered the following error while compiling the regex 'http(s|)://(?[-_\w\d.]*)': Regex: unrecognized character after (? or (?-

Any idea why? It all looks OK to me so I am not sure what I did wrong.

0 Karma

carpga
New Member

I think you may want to do an eval and rex command on the proxy Url to pull out the top level domain. I believe the join command is going to search for an exact match and I am trying to imagine scenarios where your Url and subsearch on the join won't match but should.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...