Splunk Search

Why does this join not correctly match the data?

asherman
Path Finder

Hi,

I'm experiencing some strangeness with the following query:

index=main_index | dedup _raw | sort _raw | rename index as index_from | join type=left _raw [search index=dedupped_index | dedup _raw | sort _raw | rename index as index_to ]  | where isNull(index_to) | collect dedupped_index

Essentially, I want to take the contents of main_index, find out which events are not in dedupped_index, and move them into dedupped_index. However, if I run this query multiple times, I get the same results constantly dumped into the dedupped_index, even though they are already there.

Does the order of the events matter for the join command? Is there a different/better way of doing this that will work?

Thanks!

Tags (3)
0 Karma
1 Solution

asherman
Path Finder

(index=main_index OR index=dedupped_index) | dedup _raw, index | stats count as count values(index) as index by _raw | table count, _raw, index | where count=1 AND index="main_index" | collect index=dedupped_index

View solution in original post

0 Karma

asherman
Path Finder

(index=main_index OR index=dedupped_index) | dedup _raw, index | stats count as count values(index) as index by _raw | table count, _raw, index | where count=1 AND index="main_index" | collect index=dedupped_index

0 Karma

MuS
Legend
0 Karma

Richfez
SplunkTrust
SplunkTrust

Ingenious, I like this. It seems like it shouldn't be necessary in my environment very often, but having the option could be handy.

Anyway.

If you leave off the " ... | collect dedupped_index" at the end, what does the search/join/subsearch show you? If that's not the right list of events, then I'd pare down the search bit by bit confirming each part really does what you expect.

0 Karma

asherman
Path Finder

It's strange. If I add fields to my query, those fields will always be (correctly) absent on the second run. It's only when I query the entire index that I see data when I shouldn't.

In any case, my current work around is as follows (it seems faster, but also feels hackier):

(index=main_index OR index=dedupped_index) | dedup _raw, index | stats count as count values(index) as index by _raw | table count, _raw, index | where count=1 AND index="main_index" | collect index=dedupped_index

0 Karma

somesoni2
Revered Legend

Joins are expensive and the your workaround is indeed proper way to do this 🙂

Richfez
SplunkTrust
SplunkTrust

I agree with somesoni2 that you original way was not very efficient.

If the search you posted as the comment works correctly, I would recommend write that up as the answer to your own question and mark it as answered. That way people stumbling across this question in the future will know an "answer" was found.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...