Solved: Full outer join

smolcj · ‎04-01-2013

How can I meet full outer join requirement in my search??
table a and table b with only one filed in two rows are same i want to display all rows from both the tables:
like , table a with 5 rows
and table b with 6 rows both with a common field thredname
and thread name is same in 2 rows. when i am using join type=outer i am getting 6 rows including the details from table a.
but my need is like i want 9 table where rows with common Threadname will have fully filled fields and other may have their own data.
example

Table a 
 Threadname  adata     
a1          aaa
a2          aa
a3          aa
ab1         aabb     
ab2         ab     



Table b

Threadname  bdata 
ab1         aabbb
ab2         aabbb
b1          bb
b2          bbb
b3          bbb
b4          bbb

Expected Resultant Table
Threadname  adata     bdata  
a1          aaa
a2          aa
a3          aa
ab1         aabb      aabbb
ab2         ab        aabbb
b1                    bb
b2                    bbb
b3                    bbb
b4                    bbb

please help
thank you

sideview · ‎04-04-2013

The join command's outer join is actually a "left outer" join technically.

To do what you want, you should either

1) don't use join at all, but use append and then stats

index=main source=file1 | stats count by Threadname adata | append [ search index=main source=file2 | stats count by Threadname bdata] | stats values(adata) as adata values(bdata) as bdata by Threadname

2) better, do it in a single search pipeline.

index=main (source=file1 OR source=file2) | stats values(adata) as adata values(bdata) as bdata by Threadname

I understand what you're saying about having to extract fields differently, and do different search language for one side versus the other, but too often people assume there's no way to do such things in a single pipeline. I recommend posting the full details of the searches in a separate question, asking whether you can get the desired end result without using join or append.

View solution in original post

sideview · ‎04-04-2013

The join command's outer join is actually a "left outer" join technically.

To do what you want, you should either

1) don't use join at all, but use append and then stats

index=main source=file1 | stats count by Threadname adata | append [ search index=main source=file2 | stats count by Threadname bdata] | stats values(adata) as adata values(bdata) as bdata by Threadname

2) better, do it in a single search pipeline.

index=main (source=file1 OR source=file2) | stats values(adata) as adata values(bdata) as bdata by Threadname

I understand what you're saying about having to extract fields differently, and do different search language for one side versus the other, but too often people assume there's no way to do such things in a single pipeline. I recommend posting the full details of the searches in a separate question, asking whether you can get the desired end result without using join or append.

nick405060 · ‎02-14-2020

Also, some people looking at this solution may want to be using stats list not stats values here, depending on your outer join use case

nick405060 · ‎02-13-2020

Why are people complacent about outer join functionality? Shouldn't we be questioning the SPL development skills here? Why do they even provide you with the outer option, if it's just going to be a left outer join? I would logically assume that the word "outer" means "full outer" here, especially considering I know there is already a command for an outer left join...

I understand why stats is better than join but why even offer the outer option with join...?

jaxjohnny · ‎11-11-2016

This worked for me also! Thank you!

mad4wknds · ‎05-31-2017

Worked like a charm

davebrooking · ‎04-04-2013

I think this is needs improving, but it does produce the results you've described.

I created a sourcetype of joinq that has 2 sources adata.txt and bdata.txt

The following search yields the table with 9 rows.

sourcetype=joinq source="adata.txt" | join threadname [search sourcetype=joinq source!="adata.txt"] | append [search sourcetype=joinq source="adata.txt" | join threadname type=outer [search sourcetype=joinq source!="adata.txt"]] | append [search sourcetype=joinq source!="adata.txt" | join threadname type=outer [search sourcetype=joinq source="adata.txt"]] | table threadname adata bdata | dedup threadname | sort threadname

Dave

kristian_kolb · ‎04-02-2013

Hmm, not sure I fully understand, but you might not require a join, but that depends on how you get your data...

If the 'tables' you mention are not tables, but sourcetypes, each row could be seen as an event, with adata and bdata being fields. Then something like;

... | stats first(adata) as adata first(bdata) as bdata by Threadname

would be something that could work.

You should probably give us a few sample events or a better description of your data.

/K

jaxjohnny · ‎11-11-2016

This worked for me . Thank you

jaxjohnny · ‎11-11-2016

This worked for me also. Thank you

smolcj · ‎04-04-2013

mmm i have to use a join for this search because i want to extract same field in two different column names so that i can differentiate it as data from different files so i need a join method that can provide full outer join values.. please help 😞

kristian_kolb · ‎04-03-2013

So.. then you could try:

index=main source=file1 OR source=file2 | stats first(adata) as adata first(bdata) as bdata by Threadname

assuming that there can be only one (or less) adata per Threadname, and only one (or less) bdata per Threadname.

smolcj · ‎04-03-2013

they are the values extracted from log files . Threadname adata and btada are the fields extracted from 2 different log files. table a from file1 and table b from file 2..
search may look like
index=main sorce=file1 |top Threadname adata | join type=outer Threadname [ search index=main sorce=file2 |top Threadname bdata]
but i am not able to obtain full outer join the output looks like
Threadname adata bdata

a1 aaa
a2 aa
a3 aa
ab1 aabb aabbb
ab2 ab aabbb

kristian_kolb · ‎04-03-2013

Thanks for that, but I was more interested in where the data comes from. Are they csv files, or the output of some splunk reporting command, or in separate events (i.e. table_a in one event, and table_b in another)?

/k

smolcj · ‎04-02-2013

Sorry if i confused you. edited the question. kindly please check it !!

Full outer join

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life