Splunk Search

Join 2 tables for matching field values

anshumandas
New Member

Hi,

I would like to join 2 tables with multiple fields based on common field Column 1 where Table:1 will have fields like
Table:1

Column1 Column2 Column3
xyz_sss_12 ghcgvcvb dsdffgcg
Bvc_tgg_hgh1 dfxxv hvhvhk
Bvc_tgg_hgh2 uhuhgjn jbjkjb
Bvc_tgg_hgh3 bvbmnm bnbn,m
Cdd_Tcc_Ydd cfghg ghghkj
D1 aafdfdf tgkgj
E dsfdfg cvcbb
Ftt_Tff_Ygg_1 fcxcvbvb ghvgnbvn
Ftt_Tff_Ygg_2 vbbnbb bvbbnnb

Table 2:
Column1 Column5 Column6
xyz_sss_* vbvnb b cvgbvb b
Bvc_tgg_hgh* bvnbnbb vbnbb
Cdd_Tcc_Ydd bvbbnm kkjkljlk
D* sdffg hjjhmn
E hvhjvb hvbhjm
Ftt_Tff_Ygg* fdgfgh hvbnvmn

Please help in joining Table:1 to Table:2 taking Column1 as a common field where for example xyz_sss_12 should map to xyz_sss_, Bvc_tgg_hgh1 and Bvc_tgg_hgh2 and Bvc_tgg_hgh3 should map to Bvc_tgg_hgh and likewise. A prompt response will be highly appreciated!

Tags (2)
0 Karma

woodcock
Esteemed Legend

Like this:

|inputcsv table1 | appendpipe [|inputcsv table2] | eval joiner = Column1 | rex field=joiner mode=sed "s/_[^_]+$//" | stats values(*) AS * BY joiner

Or maybe this (depending on source of data)

source=table1 OR source=table2 | eval joiner = Column1 | rex field=joiner mode=sed "s/_[^_]+$//" | stats values(*) AS * BY joiner

proletariat99
Communicator

@woodcock, this is a novel approach and absolutely brilliant... been trying to enrich data on the fly forEVER and nothing works the way I want it to, but I knew I was missing something. Lookups, joins, subsearches, appendcols, append and summary indexing have all failed me, but your answer came through. I guess I should've found it sooner. Anyway, I gave you a ton of rep points, in case you're still around.

Incidentally, it also works for mixing data types (lookup & indexed datas). For instance:

|inputlookup things.csv | eval joiner = upper(names) | append [search index=blegh sourcetype=blargh | eval joiner=upper(names) |  fields  stuff names thing1 thing2 thing3] | stats values(*) AS * by joiner

Works like a champ. Thanks again!

0 Karma

woodcock
Esteemed Legend

Yes, | stats values(*) AS * BY joiner is SUPER powerful and can do inner/outer/left/right joins very flexibly and efficiently. Please do click Accept to close the answer and clearly indicate a solution to others who may be having the same problem and are looking for a working solution.

BTW, your lookup search suffers from the 50.5K subsearch limit. If you do it this way, it is unlimited:

index=blegh sourcetype=blargh | eval joiner=upper(names) | fields  stuff names thing1 thing2 thing3
| appendpipe [|inputlookup things.csv | eval joiner = upper(names)]
| stats values(*) AS * BY joiner
0 Karma

ddrillic
Ultra Champion

Have you tried the Out-Of-The-Box join? as described at join

0 Karma

sundareshr
Legend

Try this

search source=table1 OR source=table2 | rex field=column1 "(?<newid>[A-Za-z_]+)" | eval newid=if(source=table2, column1, newid) | stats list(column1) as olddolvalues by newid
0 Karma

anshumandas
New Member

Hi Sundaresh,

I should have probably been more precise. I definitely need to use join the 2 tables first using the 1 field (Column1) for both the tables and I need to extract some more columns from both the tables to perform the calculation based on the extracted field. Where I am stuck is to extract the field values from Table-2 which has asterix at the end where as Table 1 has no asterix and instead has sub-processes. for example Table-1 has something like abc_xyz_pqr_1, abc_xyz_pqr_2, abc_xyz_pqr_3 or abc_xyz_pqr1, abc_xyz_pqr2, abc_xyz_pqr3 whereas Table-2 has only parent processes say only abc_xyz_pqr_(asterix) or abc_xyz_pqr(asterix). Also point to remember is we need to join Table-1--> Table-2 for the intricacy of the requirement that I have.

Kindly help if you can!

0 Karma

sundareshr
Legend

@anshumandas,

The asterix will also be ignore by the regex, so you will have two matching values (everything to the left of either digit or asterix). Now, to join these, you could do the join command. Something like this might work

source=table1 | rex field=column1 "(?<newid>[A-Za-z_]+)" | join newid [search source=table1 | rex field=column1 "(?<newid>[A-Za-z_]+)" | table newid ]

You can look at the online documentation for the join command and adjust the query to meet your needs.
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Join

I should caution you though, the join command has limitation and is a slow process. There may be more efficient ways to get you the final result, if you can share more details on the data and result.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...