Hi,
I would like to join 2 tables with multiple fields based on common field Column 1 where Table:1 will have fields like
Table:1
Column1 Column2 Column3
xyz_sss_12 ghcgvcvb dsdffgcg
Bvc_tgg_hgh1 dfxxv hvhvhk
Bvc_tgg_hgh2 uhuhgjn jbjkjb
Bvc_tgg_hgh3 bvbmnm bnbn,m
Cdd_Tcc_Ydd cfghg ghghkj
D1 aafdfdf tgkgj
E dsfdfg cvcbb
Ftt_Tff_Ygg_1 fcxcvbvb ghvgnbvn
Ftt_Tff_Ygg_2 vbbnbb bvbbnnb
Table 2:
Column1 Column5 Column6
xyz_sss_* vbvnb b cvgbvb b
Bvc_tgg_hgh* bvnbnbb vbnbb
Cdd_Tcc_Ydd bvbbnm kkjkljlk
D* sdffg hjjhmn
E hvhjvb hvbhjm
Ftt_Tff_Ygg* fdgfgh hvbnvmn
Please help in joining Table:1 to Table:2 taking Column1 as a common field where for example xyz_sss_12 should map to xyz_sss_, Bvc_tgg_hgh1 and Bvc_tgg_hgh2 and Bvc_tgg_hgh3 should map to Bvc_tgg_hgh and likewise. A prompt response will be highly appreciated!
Like this:
|inputcsv table1 | appendpipe [|inputcsv table2] | eval joiner = Column1 | rex field=joiner mode=sed "s/_[^_]+$//" | stats values(*) AS * BY joiner
Or maybe this (depending on source of data)
source=table1 OR source=table2 | eval joiner = Column1 | rex field=joiner mode=sed "s/_[^_]+$//" | stats values(*) AS * BY joiner
@woodcock, this is a novel approach and absolutely brilliant... been trying to enrich data on the fly forEVER and nothing works the way I want it to, but I knew I was missing something. Lookups, joins, subsearches, appendcols, append and summary indexing have all failed me, but your answer came through. I guess I should've found it sooner. Anyway, I gave you a ton of rep points, in case you're still around.
Incidentally, it also works for mixing data types (lookup & indexed datas). For instance:
|inputlookup things.csv | eval joiner = upper(names) | append [search index=blegh sourcetype=blargh | eval joiner=upper(names) | fields stuff names thing1 thing2 thing3] | stats values(*) AS * by joiner
Works like a champ. Thanks again!
Yes, | stats values(*) AS * BY joiner
is SUPER powerful and can do inner/outer/left/right joins very flexibly and efficiently. Please do click Accept
to close the answer and clearly indicate a solution to others who may be having the same problem and are looking for a working solution.
BTW, your lookup
search suffers from the 50.5K subsearch limit. If you do it this way, it is unlimited:
index=blegh sourcetype=blargh | eval joiner=upper(names) | fields stuff names thing1 thing2 thing3
| appendpipe [|inputlookup things.csv | eval joiner = upper(names)]
| stats values(*) AS * BY joiner
Have you tried the Out-Of-The-Box join? as described at join
Try this
search source=table1 OR source=table2 | rex field=column1 "(?<newid>[A-Za-z_]+)" | eval newid=if(source=table2, column1, newid) | stats list(column1) as olddolvalues by newid
Hi Sundaresh,
I should have probably been more precise. I definitely need to use join the 2 tables first using the 1 field (Column1) for both the tables and I need to extract some more columns from both the tables to perform the calculation based on the extracted field. Where I am stuck is to extract the field values from Table-2 which has asterix at the end where as Table 1 has no asterix and instead has sub-processes. for example Table-1 has something like abc_xyz_pqr_1, abc_xyz_pqr_2, abc_xyz_pqr_3 or abc_xyz_pqr1, abc_xyz_pqr2, abc_xyz_pqr3 whereas Table-2 has only parent processes say only abc_xyz_pqr_(asterix) or abc_xyz_pqr(asterix). Also point to remember is we need to join Table-1--> Table-2 for the intricacy of the requirement that I have.
Kindly help if you can!
@anshumandas,
The asterix will also be ignore by the regex, so you will have two matching values (everything to the left of either digit or asterix). Now, to join these, you could do the join
command. Something like this might work
source=table1 | rex field=column1 "(?<newid>[A-Za-z_]+)" | join newid [search source=table1 | rex field=column1 "(?<newid>[A-Za-z_]+)" | table newid ]
You can look at the online documentation for the join
command and adjust the query to meet your needs.
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Join
I should caution you though, the join
command has limitation and is a slow process. There may be more efficient ways to get you the final result, if you can share more details on the data and result.