Solved: Multiple Join Nested Join similar to IF ELSE LOGIC

cabauah · ‎06-20-2017

I have
sourcetype = sourcetype1 with field 1, field 2, field 3, field 4, part1, key1
sourcetype = sourcetype2 with field 1b, field2b, field3b, field4b, part2, key2

The pseudo logic at a high level is
Join fields in sourcetye1 and sourcetype2 if part1=part2
Else join fields in sourcetype1 and sourcetype2 if part1 NOT in part2 with key1=key2

key – is a concatenation of other fields within the sourcetype (i.e eval key=field4+field1+field5+field6)

index=poc sourcetype=sourcetype1 | rename "Field 1" as Afield1 | rename "Field 2" AS Afield2, "Field 3" AS Apart1, "field 4" AS Afield4, “Field 5” AS Afield5, “Field 6” AS Afield6 
| eval Akey=Afield2+Afield5+Afield6+Afield1
| stats max(_time) as latestA by Afield2, Apart1, Apart4, Apart5, Apart6, Afield1, Akey
| where status=30 AND Afield2=* AND Apart1* AND Afield4=* 
| rename Apart1 as part
| join part type=outer [ search index=poc sourcetype=sourcetype2 | rename "Field 1" as Bfield1 | rename "Field 2" AS Bfield2, "Field 3" AS Bpart1, "field 4" AS Bfield4, “Field 5” AS Bfield5, “Field 6” AS Bfield6 
| eval Bkey=Bfield2+Bfield5+Bfield6+Bfield1
| where B_appstatus = complete | stats max(_time) as latestA by Bfield2, Bpart1, Bpart4, Bpart5, Bpart6, Bfield1, Bkey
| rename Bpart1 as part ]  
***| search NOT Bpart1=*
| rename Akey as key 
| join key type=inner [search index=poc sourcetype=sourcetype2 | rename "Field 1" as Bfield1 | rename "Field 2" AS Bfield2, "Field 3" AS Bpart1, "field 4" AS Bfield4, “Field 5” AS Bfield5, “Field 6” AS Bfield6 
| eval Bkey=Bfield2+Bfield5+Bfield6+Bfield1
| where B_appstatus = complete | stats max(_time) as latestA by Bfield2, Bpart1, Bpart4, Bpart5, Bpart6, Bfield1, Bkey
| rename Bkey as key*** ]

| table Field, part, Afield1, Bfield1

where the table should be a merge data of the first join (with common part fields) and 2nd join (with common key fields)

DalJeanis · ‎06-21-2017

Okay, you are doing way too much work, and your pseudocode is just confusing the issue.

Pretend, for a moment, that you know LESS about what you are trying to do.

WWAAAYYY less.

Once you've solve the problem in the abstract, it's much easier to code the complete solution.

So, pretend there are only 3 fields. Pretend they are all named Field2 Field3 and Field4 so there are NO renames needed.

This gets you the latest record for each combination of values with sourcetype1, status 30 and values in Field2, Field3 and Field4:

index=POC  Field2=* Field3=* Field4=* (sourcetype=sourcetype1 AND status=30)
| table _time sourcetype Field2 Field3 Field4
| dedup  sourcetype Field2 Field3  Field4

This gets you the latest record for each combination of values with sourcetype2, with status complete and with values in Field2, Field3 and Field4. I'm assuming that there is some result field you are trying to get from the match, so I've called it FieldResult:

index=POC  Field2=* Field3=* Field4=* (sourcetype=sourcetype2 AND appstatus=complete)
| table _time sourcetype Field2 Field3 Field4 FieldResult
| dedup  sourcetype Field2 Field3 Field4

Now, you want, for each event in the first set, the member in the second set with the same value for Field3. If there is no match, then you want the member in the second set that matches 2+4. You really wanted 2+5+6+1, but this is the simple version, right?

Now, we combine the above pulls like this...

index=POC Field2=* Field3=* Field4=*
((sourcetype=sourcetype1 AND status=30)   OR (sourcetype=sourcetype2 AND appstatus=complete))
| table _time sourcetype Field2 Field3 Field4 FieldResult
| dedup  sourcetype Field2 Field3  Field4

| rename COMMENT as "Use something like this to kill the FieldResult value in sourcetype1 if it might exist, delete this code if not"
| eval  FieldResult=if(sourcetype=sourcetype1,null(),FieldResult) 

| rename COMMENT as "roll the values of FieldResult over from sourcetype2 to matching sourcetype1 records"
| eventstats latest(FieldResult) as FieldResult by Field3
| eventstats latest(FieldResult) as BackupFieldResult by Field2 Field4
| eval FieldResult=coalesce(FieldResult, BackupFieldResult,"((not found))")

| rename COMMENT as "now keep only the records from sourcetype1"
| where sourcetype=sourcetype1

Okay, now that's the general method. Think you can apply it to your problem?

A couple of quick notes -

1) Don't do join if you can avoid it. They are inefficient in splunk relative to how they work in relational databases. Just throw all the records together in a pot and stir. The logic has to prepare each record type correctly while stirring, though.

2) Get rid of records as early as possible, preferably by not bringing them in in the first place. Don't do any work on any records until you have got rid of the ones that you have no use for. status=30 and status=complete belong in the search itself, not way down the list.

3) Minimize the renames. Only use them when they make your life MUCH easier.

4) eventstats is a great way to copy data across "joinable" records. Use the as rename so you have a clean landing spot, then throw away the records you don't need.

5) Not demonstrated here, but streamstats is also a really useful tool when you have records that need to be related to each other. That will have to wait for another day.

View solution in original post

cabauah · ‎09-06-2017

for the 2nd source it's gotten complicated as it's not just referring to the latest record, some calculations have to be done to pick the right event and it vary per field. what's the most time efficient way to do it?

DalJeanis · ‎06-21-2017