Splunk Search

Deduping when working with multiple sourcetypes

msarro
Builder

Hey everyone, I am working on an issue right now and I'm running into a problem with my understanding of how splunk works. I have multiple sourcetypes, and each is going into its own index (the times when the individual sources are received is extremely staggered, and this is the recommendation from Splunk to take care of this). I need to correlate these four sources together. So, when I begin a query, it starts like this:

index=prodcorr OR index=premed_bts OR index=premed_pbts|rename MSP_vqmcallid AS Call_ID_1
|rename PBTS_ORIG_SIP_CALL_ID AS Call_ID_2
|rename PBTS_TERM_SIP_CALL_ID AS Call_ID_3
|rename BTS_ORIG_SIP_CALL_ID AS Call_ID_3
|transaction Call_ID_1,Call_ID_2,Call_ID_3 maxspan=1d keepevicted=true

The issue comes in with some of my sources. They may have multiple records with the same Call ID field, so I need to dedup them prior to the transaction command. Will running the dedup command on one sourcetype (call it sourcetype A) out of the group of different sourcetypes (sourcetype B, C and D) get rid of sourcetype B, C, and D? Right now I am having a heck of a time getting any type of correlation to work because of the multiple events for each of my sources. Sadly the call ID is the only way to reliably correlate the records.

Further, is there a way to dedup by specifying a field, and then excluding all events where a certain field is null? For instance, event1 has a call id field of abc123 and a Call_Type field of Network. Event2 and event3 have a call id field of abc123, and a Call_Type field that is null. I'd want to keep event1, and ignore event2 and event3. I want to do this for each of the sourcetypes included in my search (so that's a total of four sourcetypes). Any suggestions would be appreciated. Hopefully I explained this well enough.

Tags (2)

piebob
Splunk Employee
Splunk Employee

just wanted to compliment you on an extremely well-written question. thanks!

0 Karma

hexx
Splunk Employee
Splunk Employee

The dedup search command has a "keepempty" option that might fit your use-case :

keepempty
Syntax: keepempty=<bool> 
Description: If an event contains a null value for one or more of the specified fields, the event is either retained (T) or discarded (default, F).

As I understand it, you could use three consecutive dedups to :

  • Retain the most recent event for each unique value of the "Call_ID_1" field and all events that don't have a "Call_ID_1" field : | dedup Call_ID_1 keepempty=T.
  • Retain the most recent event for each unique value of the "Call_ID_2" field and all events that don't have a "Call_ID_2" field : | dedup Call_ID_2 keepempty=T.
  • Retain the most recent event for each unique value of the "Call_ID_3" field and all events that don't have a "Call_ID_3" field : | dedup Call_ID_3 keepempty=T.

In the context of your search command, this would look something like :

index=prodcorr OR index=premed_bts OR index=premed_pbts
| rename MSP_vqmcallid AS Call_ID_1 PBTS_ORIG_SIP_CALL_ID AS Call_ID_2 PBTS_TERM_SIP_CALL_ID AS Call_ID_3 BTS_ORIG_SIP_CALL_ID AS Call_ID_3
| dedup Call_ID_1 keepempty=T | dedup Call_ID_2 keepempty=T | dedup Call_ID_3 keepempty=T
| transaction Call_ID_1,Call_ID_2,Call_ID_3 maxspan=1d keepevicted=true

As for ignoring events which don't have a given field, you simply do this with a search term such as NOT field=*.

msarro
Builder

My apologies, I had a typo in this. It is now fixed.

0 Karma

Lowell
Super Champion

Do you actually have 3 different IDs that you are trying to link together? I'm slightly confused by this. Are you renaming "PBTS_ORIG_SIP_CALL_ID" twice? Do the different fields come in from different sourcetypes/indexes? Or all they all mixed around. I suspect a few example events would help clear things up.

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...