Splunk Search

Combine two record sets where a distinct value matches (without using join)

chinchin96
New Member

I have a search that generates two distinct types of record entries (searching for "for event"):

  1. 2015-05-05 for event 201216053940303kljdwlj for recipient Geogm
  2. 2015-05-05 card 12345678910 for event 201216053940303kljdwlj

Fields I created:

  1. 201216053940303kljdwlj = eventid

  2. Geogm = username (it's always 6
    characters long and at the end of the
    line)

  3. 12345678910 = cardnum

I want a table that shows the username and eventid of the first type of record and combines it with the card number where the eventid is a match, showing something like the following:

Time       , username , eventid               , cardnum
2015-05-05 , Geogm    , 201216053940303kljdwlj , 12345678910

I think I got this working using join, but would like a different way to achieve it due to performance issues of using join.

My current query:

index=myindex source="source1" OR source="source2"  "for recipient" 
| extract pairdelim="|;,", kvdelim="=:", auto=f  
| rex field=_raw "(?<username>\s......$)" 
| search username!=""  
| rex field=_raw "event (?<eventid>.(\w+))" 
| search eventid !="" 
| table _time username eventid 
| join eventid [search index=myindex source="source1" OR source="source2"  "for event" 
| extract pairdelim="|;,", kvdelim="=:", auto=f  
| rex field=_raw "card (?<cardnum>.(\w+))"  
| search cardnum !=""  
| rex field=_raw "event (?<eventid>.(\w+))"  
| search eventid !="" 
| table  eventid cardnum]

This is simlar to the question shown here: answers.splunk.com/answers/443909/how-do-i-joincombine-my-two-search-searches-to-get.html?utm_source=typeahead&utm_medium=newquestion&utm_campaign=no_votes_sort_relev but it did not receive an accepted answer

Tags (2)
0 Karma
1 Solution

somesoni2
Revered Legend

How about this?

index=myindex source="source1" OR source="source2"  "for recipient" 
| extract pairdelim="|;,", kvdelim="=:", auto=f
| rex field=_raw "event (?<eventid>.(\w+))" 
| search eventid !="" 
| rex field=_raw "(?<username>\s......$)" 
| rex field=_raw "card (?<cardnum>.(\w+))"
| eval timestmap=if(isnotnull(username),_time,null())
| stats values(timestamp) as _time values(username) as username values(cardnum) as cardnum by eventid

View solution in original post

0 Karma

woodcock
Esteemed Legend

Like this:

index=myindex (source="source1" OR source="source2")  ("for recipient" OR  "for event")
| extract pairdelim="|;,", kvdelim="=:", auto=f  
| rex "(?<username>\s......$)" 
| rex "card (?<cardnum>.(\w+))"  
| search username!=""  OR cardnum !="" 
| rex "event (?<eventid>.(\w+))" 
| search eventid !="" 
| stats min(_time) AS Time values(username) AS username values(cardnum) AS cardnum BY eventid
0 Karma

chinchin96
New Member

Thanks @woodcock! This worked as well. Also had to add

| table _time survey_uid partial_email  assigned_card

to get the column order right. So, much like @somesoni2, I love that your method works but I don't kno why. Can you walk me through the logic?

Also, both your query and @somesoni2 seem to be pretty fast. Does Splunk have a preference between the two?

0 Karma

woodcock
Esteemed Legend

The general approach to a merge is to pull in all the events with a single main search; that is why I put the ("for recipient" OR "for event") there. Then you can do a stats .. BY JoiningField to do the merge. You just need to decide what it is that you need to keep from the merge. doing stats values(*) AS * BY JoiningField gets you pretty much everything and you can trim down from that but in your case you knew exactly what you needed so I worked with that.

I suspect that mine is faster because the other one has 2 search passes but the only way to know for sure is to run each search on your data and use the Job Inspector to see how long each one really takes.

0 Karma

chinchin96
New Member

Thank @woodcock! This explanation is very useful!

0 Karma

somesoni2
Revered Legend

How about this?

index=myindex source="source1" OR source="source2"  "for recipient" 
| extract pairdelim="|;,", kvdelim="=:", auto=f
| rex field=_raw "event (?<eventid>.(\w+))" 
| search eventid !="" 
| rex field=_raw "(?<username>\s......$)" 
| rex field=_raw "card (?<cardnum>.(\w+))"
| eval timestmap=if(isnotnull(username),_time,null())
| stats values(timestamp) as _time values(username) as username values(cardnum) as cardnum by eventid
0 Karma

chinchin96
New Member

Thanks @somesoni2! This worked (after noticing timestamp was mispelled on line 7. :-))

The only issue I saw is that for records in the final display that didn't have a username (not all records do), there was not a timestamp shown even though one exists. I changed isnotnull to isnull and it seemed to work. I then added a

| table _time eventid username cardnum

to the end to get the order presented in the question.

So, while it all works now, I don't know that I could explain how. I'd love to be able to use your method again in the future but don't know I can follow the logic. Can you help me understand it?

0 Karma

Richfez
SplunkTrust
SplunkTrust

Shouldn't either line 1 or line 8 be "for recipient"?

0 Karma

chinchin96
New Member

Oops. You're right @rich7177. Just updated it.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...