Splunk Search

Inner Join Returns Different Results When Reversing Search Order

ravencr0ss
New Member

Been working on a proof of concept that seems to be eluding me. From my work with SQL I would expect that an Inner Join would return the same results regardless of which search is the primary and which is the sub. However, this doesn't seem to be the case. Using the first query returns approximately 600 events, whereas if the searches are flipped to the second query it returns more than 3000 events. Am I missing something or is this just an oddity of how joins really work in Splunk?

Query 1 - Approximately 600 results.

sourcetype=mcafee:protection signature=7058
| join type=inner host [search sourcetype=XmlWinEventLog:Security]
| table host, signature_name, Name

Query 2 - More than 3000 results.

sourcetype=XmlWinEventLog:Security
| join type=inner host [search sourcetype=mcafee:protection signature=7058 ]
| table host, signature_name, Name

I have tried both constraining and opening the max values returned, constraining the time, and specifying specific fields in each query to no avail.

0 Karma
1 Solution

DMohn
Motivator

You are misunderstanding the way the join works here...

An inner join doesn't necessarily have to be a 1:1 join, but can be a n:1 or 1:n join - if set so in the options (see parameter max in the join reference: https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Join)

So I assume, your first primary search (base search AV data) returns about 600 results, for which the subsearch will return more than one result per host field - but only the first one will be matched, as this is the default setting for the max parameter.
The second primary search however returns around 3000 (or maybe more) results, and it will be a n:1 match, as you may have several entries per host in your WinEventLog which will be joined with the same McAfee log.

Try using join type=inner max=0 and see if the numer of results will be equal - which should be the case if my assumption is correct.

View solution in original post

0 Karma

DMohn
Motivator

You are misunderstanding the way the join works here...

An inner join doesn't necessarily have to be a 1:1 join, but can be a n:1 or 1:n join - if set so in the options (see parameter max in the join reference: https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Join)

So I assume, your first primary search (base search AV data) returns about 600 results, for which the subsearch will return more than one result per host field - but only the first one will be matched, as this is the default setting for the max parameter.
The second primary search however returns around 3000 (or maybe more) results, and it will be a n:1 match, as you may have several entries per host in your WinEventLog which will be joined with the same McAfee log.

Try using join type=inner max=0 and see if the numer of results will be equal - which should be the case if my assumption is correct.

0 Karma

ravencr0ss
New Member

I tried again with the max=0 and as before the results between the two are out of wack, but now I understand why. Originally I was under the impression the match criteria was on a 1:1 not 1:n. Thank you for the insight.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...