Join Two source based on content of fields present...

bladeboxe · ‎05-18-2015

Hi, I have BIG URGENT CASE here, and I'll appreciate your great help.

Here it is, I need this type of (SQL) query to be "translated" in Splunk since the sources are .csv files (extract of sql tables originally).

select
t1.col1, t1.col2, t2.cl1, t2.cl2
``from
( select col1,col2,col3,col4 from tab1 ) as t1
right join
( select cl1,cl2,cl3,cl4 from tab2 ) as t2
on t1.col3 like concat('%',t2.cl3.'%')
and t1.col4 like concat('%',t2.cl4.'%')
;

The thing is that I absolutely have nothing like "on col3=cl3" so that I have to dig to find if the value of col3 IS IN the string chain of the cl3.

I hope someone will be able to get the point and help me.
(oh, by the way, I tried, "append" and "join" but probably the wrong way (since the result went totally wrong)

Thank you for your great help.
B.

vganjare · ‎05-19-2015

Hi,

You can TAB1 & TAB2 information seperately. Once loaded, then you can try correlating the information together. I have stroed the information in following format:
Data.csv

lastname, name, service_tag, birthday
Rosenberg, Joseph, REF4A, 25/05/1982
Attali, Dan, REF4A, 18/07/1981
Salcherd, David, REF4B, 12/01/1959
Sherman, Gad, REF4C, 07/07/1974

Data1.csv

raining, price, services_concerned
communication, 1024€, REF4A
security, 1344€, REF4A:REF4B
managment, 2745€, REF4C

Please note: multiple services_concerned are represent as : (colon) seperated values.

Then, using following query, I was able to establish the relation.

|inputlookup Data.csv | join max=0 service_tag [|inputlookup Data1.csv | eval services_concerned=split(services_concerned,":") | mvexpand services_concerned | rename services_concerned as service_tag]

Following is the output generated:

birthday    lastname    name    price   raining service_tag
25/05/1982   Rosenberg   Joseph  1024€     communication   REF4A
25/05/1982   Rosenberg   Joseph  1344€     security    REF4A
18/07/1981   Attali  Dan     1024€     communication   REF4A
18/07/1981   Attali  Dan     1344€     security    REF4A
12/01/1959   Salcherd    David   1344€     security    REF4B
07/07/1974   Sherman     Gad     2745€     managment   REF4C

Thanks!!

bladeboxe · ‎05-19-2015

Hi,

Thank you already for you great he.
But the thing is that I have to join on multiple set of columns (like content of file1.col1 to be found in file2.col1 AND content of file1.col2 to be found also in file2.col2).
But on your example you only link 1 set of column.
Do you know how I can join on multiple sets?

Thank you!
B.

vganjare · ‎05-19-2015

you can try putting join max=0 type=outer service_tag col2 col3 col4

bladeboxe · ‎05-19-2015

Hi,

Thank you already for taking the time to answer.
Actually, I definitely get what you propose, but this assume that col2 and col3 and col4 are actually present in both files (which is not the case).
What I have is (for a set of columns) :

[first set of columns to compare]

FILE 1 : col2 - "REF4B" and in FILE 2 : colX - "we have all sort of comment where the REF4B is included eventually"

[second set of columns to compare]

FILE 1 : col3 - "ssh" and in FILE 2 : colY - "those services (sshd, ftpd) are running into this server"

And I need to cross the two sets.

I gave a simple example so that it would be more easier to get but actually it is about multiple sets.

Hope I didn't tired you already! Thanks!

Thanks,
B.

vganjare · ‎05-18-2015

Can you please share the sample csv data for both the tables?

bladeboxe · ‎05-18-2015

Hi, I'll try the following example to explain what is wanted

TAB 1 : Collaborateurs

lastname, name, service tag, birthday
Rosenberg, Joseph, REF4A, 25/05/1982
Attali, Dan, REF4A, 18/07/1981
Salcherd, David, REF4B, 12/01/1959
Sherman, Gad, REF4C, 07/07/1974

TAB 2: Training's Catalogue

training, price, services, concerned
communication, 1024€, Only REF4A
security, 1344€, REF4A and REF4B are concerned
managment, 2745€, REF4C

Result expected: Skills and costs per collaborator

lastname, name, training, price
Rosenberg, Joseph, communication, 1024€
Rosenberg, Joseph, security, 1344€
Attali, Dan, communication, 1024€
Attali, Dan, security, 1344€
Salcherd, David, security, 1344€
Sherman, Gad, managment, 2745€

I need that for each line of TAB 1, TAB 2 to be parsed line per line until we get the keyword from "service tag" (of tab1) into the chain in column "services concerned" (of tab2).

Thank you
B.

Join Two source based on content of fields present in other fields

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes

Welcome to the Splunk Community!