Solved: Joining multiple events via a common field.

mgubser · ‎06-02-2014

So I have three sources that i need to join together to view as one event. The three sources are NewWFL, MoneyNEW, and new3Money.

Field I'm looking to use to join:

NewWFL: Document_Number

MoneyNEW: Document_Number and DocumentNo

new3Money: DocumentNo

Currently im using this search command

index=work_flow source="C:\\Users\\C754651\\Desktop\\John\\NewWFL.csv" Document_Number=* 
 | join type=left Document_Number [search index=Work_flow  source="C:\\Users\\C754651\\Desktop\\John\\MoneyNEW.csv"] 
 | Search index=Work_flow source="C:\\Users\\C754651\\Desktop\\John\\moneyNEW.csv" 
 | join type=left DocumentNo [search index=Work_flow source="C:\\Users\\C754651\\Desktop\\John\\new3money.csv"] 
 | fillnull value="N/A"

So how my data is it would be impossible for you to have a DocumentNo without a Document_Number. This search works but when I view my data, the join function has added some data from my third source to my first source. This is incorrect because I to get to the third source you have to have the second file to connect the two. Any event from my first file NewWFL that doesn't have a Document_Number it add the same data from one event in my third file. I can't see how to prevent this. All the entries with Document_Number(s) have the correct unique value. But if a entry doesn't have a Document_Number then they all have the same incorrect data from new3Money.

Any suggestions would help. I'm not 100% sure join is the correct function for this but my task is to be able to connect these different sources and be able to use the fields from the sources as if it was one big event.

somesoni2 · ‎06-02-2014

Try this

index=work_flow source="C:\\Users\\C754651\\Desktop\\John\\NewWFL.csv" Document_Number=* 
 | join type=left Document_Number [search index=Work_flow  source="C:\\Users\\C754651\\Desktop\\John\\MoneyNEW.csv" | stats count by Document_Number,DocumentNo] 
 | join type=left DocumentNo [search index=Work_flow source="C:\\Users\\C754651\\Desktop\\John\\new3money.csv"] 
 | fillnull value="N/A"

This will be really slow. If possible apply any statistical operations before join (based on your final need) so that no of records to be joied are less.

View solution in original post

somesoni2 · ‎06-02-2014

Try this

index=work_flow source="C:\\Users\\C754651\\Desktop\\John\\NewWFL.csv" Document_Number=* 
 | join type=left Document_Number [search index=Work_flow  source="C:\\Users\\C754651\\Desktop\\John\\MoneyNEW.csv" | stats count by Document_Number,DocumentNo] 
 | join type=left DocumentNo [search index=Work_flow source="C:\\Users\\C754651\\Desktop\\John\\new3money.csv"] 
 | fillnull value="N/A"

This will be really slow. If possible apply any statistical operations before join (based on your final need) so that no of records to be joied are less.

somesoni2 · ‎06-02-2014

I guess it was due to the additional "|search" command you added in the middle. It was basically just filtering data from the join result of your source1 and source2. Also, you're doing join on the full data from source2 which was overwriting your important fields like 'source' for matching records. Basically in join, include only the fields which are required (which I did by using stats).

mgubser · ‎06-02-2014

This worked thank you much, caused alot of frustration. Do you know why it was applying that event data to all the events incorrectly?

Ayn · ‎06-02-2014

Seems to me you should be using the transaction function.

... | transaction Document_Number DocumentNo

http://docs.splunk.com/Documentation/Splunk/6.1/SearchReference/Transaction

mgubser · ‎06-02-2014

The only problem when using transaction, is I am unsure as to how to see uncommon fields in a table. The only fields it will show when i click on the multi-line event is fields they have in common. I need to see all the fields and be able to have that in a table.

Joining multiple events via a common field.

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!