Solved: Why does this join not correctly match the data?

asherman · ‎08-21-2015

Hi,

I'm experiencing some strangeness with the following query:

index=main_index | dedup _raw | sort _raw | rename index as index_from | join type=left _raw [search index=dedupped_index | dedup _raw | sort _raw | rename index as index_to ]  | where isNull(index_to) | collect dedupped_index

Essentially, I want to take the contents of main_index, find out which events are not in dedupped_index, and move them into dedupped_index. However, if I run this query multiple times, I get the same results constantly dumped into the dedupped_index, even though they are already there.

Does the order of the events matter for the join command? Is there a different/better way of doing this that will work?

Thanks!

asherman · ‎08-25-2015

View solution in original post

asherman · ‎08-25-2015

MuS · ‎08-21-2015

It's always a pleasure to point to this nice answer http://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-joi... 🙂

Richfez · ‎08-21-2015

Ingenious, I like this. It seems like it shouldn't be necessary in my environment very often, but having the option could be handy.

Anyway.

If you leave off the " ... | collect dedupped_index" at the end, what does the search/join/subsearch show you? If that's not the right list of events, then I'd pare down the search bit by bit confirming each part really does what you expect.

asherman · ‎08-21-2015

It's strange. If I add fields to my query, those fields will always be (correctly) absent on the second run. It's only when I query the entire index that I see data when I shouldn't.

In any case, my current work around is as follows (it seems faster, but also feels hackier):

somesoni2 · ‎08-21-2015

Joins are expensive and the your workaround is indeed proper way to do this 🙂

Richfez · ‎08-21-2015

I agree with somesoni2 that you original way was not very efficient.

If the search you posted as the comment works correctly, I would recommend write that up as the answer to your own question and mark it as answered. That way people stumbling across this question in the future will know an "answer" was found.

Why does this join not correctly match the data?

Join Us for Splunk University and Get Your Bootcamp Game On!

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

Announcing Scheduled Export GA for Dashboard Studio