I have
sourcetype
= sourcetype1
with field 1
, field 2
, field 3
, field 4
, part1
, key1
sourcetype
= sourcetype2
with field 1b
, field2b
, field3b
, field4b
, part2
, key2
The pseudo logic at a high level is
Join fields in sourcetye1
and sourcetype2
if part1
=part2
Else join fields in sourcetype1
and sourcetype2
if part1
NOT in part2
with key1
=key2
key
– is a concatenation of other fields within the sourcetype (i.e eval key=field4+field1+field5+field6
)
index=poc sourcetype=sourcetype1 | rename "Field 1" as Afield1 | rename "Field 2" AS Afield2, "Field 3" AS Apart1, "field 4" AS Afield4, “Field 5” AS Afield5, “Field 6” AS Afield6
| eval Akey=Afield2+Afield5+Afield6+Afield1
| stats max(_time) as latestA by Afield2, Apart1, Apart4, Apart5, Apart6, Afield1, Akey
| where status=30 AND Afield2=* AND Apart1* AND Afield4=*
| rename Apart1 as part
| join part type=outer [ search index=poc sourcetype=sourcetype2 | rename "Field 1" as Bfield1 | rename "Field 2" AS Bfield2, "Field 3" AS Bpart1, "field 4" AS Bfield4, “Field 5” AS Bfield5, “Field 6” AS Bfield6
| eval Bkey=Bfield2+Bfield5+Bfield6+Bfield1
| where B_appstatus = complete | stats max(_time) as latestA by Bfield2, Bpart1, Bpart4, Bpart5, Bpart6, Bfield1, Bkey
| rename Bpart1 as part ]
***| search NOT Bpart1=*
| rename Akey as key
| join key type=inner [search index=poc sourcetype=sourcetype2 | rename "Field 1" as Bfield1 | rename "Field 2" AS Bfield2, "Field 3" AS Bpart1, "field 4" AS Bfield4, “Field 5” AS Bfield5, “Field 6” AS Bfield6
| eval Bkey=Bfield2+Bfield5+Bfield6+Bfield1
| where B_appstatus = complete | stats max(_time) as latestA by Bfield2, Bpart1, Bpart4, Bpart5, Bpart6, Bfield1, Bkey
| rename Bkey as key*** ]
| table Field, part, Afield1, Bfield1
where the table should be a merge data of the first join (with common part
fields) and 2nd join (with common key
fields)
Okay, you are doing way too much work, and your pseudocode is just confusing the issue.
Pretend, for a moment, that you know LESS about what you are trying to do.
WWAAAYYY less.
Once you've solve the problem in the abstract, it's much easier to code the complete solution.
So, pretend there are only 3 fields. Pretend they are all named Field2 Field3 and Field4 so there are NO renames needed.
This gets you the latest record for each combination of values with sourcetype1, status 30 and values in Field2, Field3 and Field4:
index=POC Field2=* Field3=* Field4=* (sourcetype=sourcetype1 AND status=30)
| table _time sourcetype Field2 Field3 Field4
| dedup sourcetype Field2 Field3 Field4
This gets you the latest record for each combination of values with sourcetype2, with status complete and with values in Field2, Field3 and Field4. I'm assuming that there is some result field you are trying to get from the match, so I've called it FieldResult:
index=POC Field2=* Field3=* Field4=* (sourcetype=sourcetype2 AND appstatus=complete)
| table _time sourcetype Field2 Field3 Field4 FieldResult
| dedup sourcetype Field2 Field3 Field4
Now, you want, for each event in the first set, the member in the second set with the same value for Field3. If there is no match, then you want the member in the second set that matches 2+4. You really wanted 2+5+6+1, but this is the simple version, right?
Now, we combine the above pulls like this...
index=POC Field2=* Field3=* Field4=*
((sourcetype=sourcetype1 AND status=30) OR (sourcetype=sourcetype2 AND appstatus=complete))
| table _time sourcetype Field2 Field3 Field4 FieldResult
| dedup sourcetype Field2 Field3 Field4
| rename COMMENT as "Use something like this to kill the FieldResult value in sourcetype1 if it might exist, delete this code if not"
| eval FieldResult=if(sourcetype=sourcetype1,null(),FieldResult)
| rename COMMENT as "roll the values of FieldResult over from sourcetype2 to matching sourcetype1 records"
| eventstats latest(FieldResult) as FieldResult by Field3
| eventstats latest(FieldResult) as BackupFieldResult by Field2 Field4
| eval FieldResult=coalesce(FieldResult, BackupFieldResult,"((not found))")
| rename COMMENT as "now keep only the records from sourcetype1"
| where sourcetype=sourcetype1
Okay, now that's the general method. Think you can apply it to your problem?
A couple of quick notes -
1) Don't do join
if you can avoid it. They are inefficient in splunk relative to how they work in relational databases. Just throw all the records together in a pot and stir. The logic has to prepare each record type correctly while stirring, though.
2) Get rid of records as early as possible, preferably by not bringing them in in the first place. Don't do any work on any records until you have got rid of the ones that you have no use for. status=30 and status=complete belong in the search itself, not way down the list.
3) Minimize the renames. Only use them when they make your life MUCH easier.
4) eventstats
is a great way to copy data across "joinable" records. Use the as
rename so you have a clean landing spot, then throw away the records you don't need.
5) Not demonstrated here, but streamstats
is also a really useful tool when you have records that need to be related to each other. That will have to wait for another day.
for the 2nd source it's gotten complicated as it's not just referring to the latest record, some calculations have to be done to pick the right event and it vary per field. what's the most time efficient way to do it?
Okay, you are doing way too much work, and your pseudocode is just confusing the issue.
Pretend, for a moment, that you know LESS about what you are trying to do.
WWAAAYYY less.
Once you've solve the problem in the abstract, it's much easier to code the complete solution.
So, pretend there are only 3 fields. Pretend they are all named Field2 Field3 and Field4 so there are NO renames needed.
This gets you the latest record for each combination of values with sourcetype1, status 30 and values in Field2, Field3 and Field4:
index=POC Field2=* Field3=* Field4=* (sourcetype=sourcetype1 AND status=30)
| table _time sourcetype Field2 Field3 Field4
| dedup sourcetype Field2 Field3 Field4
This gets you the latest record for each combination of values with sourcetype2, with status complete and with values in Field2, Field3 and Field4. I'm assuming that there is some result field you are trying to get from the match, so I've called it FieldResult:
index=POC Field2=* Field3=* Field4=* (sourcetype=sourcetype2 AND appstatus=complete)
| table _time sourcetype Field2 Field3 Field4 FieldResult
| dedup sourcetype Field2 Field3 Field4
Now, you want, for each event in the first set, the member in the second set with the same value for Field3. If there is no match, then you want the member in the second set that matches 2+4. You really wanted 2+5+6+1, but this is the simple version, right?
Now, we combine the above pulls like this...
index=POC Field2=* Field3=* Field4=*
((sourcetype=sourcetype1 AND status=30) OR (sourcetype=sourcetype2 AND appstatus=complete))
| table _time sourcetype Field2 Field3 Field4 FieldResult
| dedup sourcetype Field2 Field3 Field4
| rename COMMENT as "Use something like this to kill the FieldResult value in sourcetype1 if it might exist, delete this code if not"
| eval FieldResult=if(sourcetype=sourcetype1,null(),FieldResult)
| rename COMMENT as "roll the values of FieldResult over from sourcetype2 to matching sourcetype1 records"
| eventstats latest(FieldResult) as FieldResult by Field3
| eventstats latest(FieldResult) as BackupFieldResult by Field2 Field4
| eval FieldResult=coalesce(FieldResult, BackupFieldResult,"((not found))")
| rename COMMENT as "now keep only the records from sourcetype1"
| where sourcetype=sourcetype1
Okay, now that's the general method. Think you can apply it to your problem?
A couple of quick notes -
1) Don't do join
if you can avoid it. They are inefficient in splunk relative to how they work in relational databases. Just throw all the records together in a pot and stir. The logic has to prepare each record type correctly while stirring, though.
2) Get rid of records as early as possible, preferably by not bringing them in in the first place. Don't do any work on any records until you have got rid of the ones that you have no use for. status=30 and status=complete belong in the search itself, not way down the list.
3) Minimize the renames. Only use them when they make your life MUCH easier.
4) eventstats
is a great way to copy data across "joinable" records. Use the as
rename so you have a clean landing spot, then throw away the records you don't need.
5) Not demonstrated here, but streamstats
is also a really useful tool when you have records that need to be related to each other. That will have to wait for another day.
I wan to be able to flag the mismatch values between source1 and source2 for field3 and they key.. can that still be applied?