Splunk Search

Is there a way to avoid the join command between indexes?

Esperteyu
Explorer

I'm trying to get my head around the alternatives, but can't see how I could get rid of the join in the following query:

 index="docverificationengine" "Issuing country does not match WR records for Sender" | rex field=_raw  "records for Sender \[(?P<senderId>[^\]]+)\]" | table senderId   | join senderId [ search index="senderverification" Verification "DocumentType\\\":2" | rex field=_raw "queue: {\\\\\"SenderId\\\\\":(?<senderId>\d+)," | table senderId]

I have to admit though, that I don't have a clear concept of what would be a good performance. It takes around 4.5 seconds to run with a set of less than 2k in the "docverificationengine" index but over 300k in the "senderverification" one

Tags (2)
0 Karma
1 Solution

DalJeanis
Legend

There are several ways.

The following is the preferable, stats version, that has no limitations.

(index="docverificationengine" "Issuing country does not match WR records for Sender") 
OR (index="senderverification" Verification "DocumentType\\\":2" ) 
| rex field=_raw  "records for Sender \[(?P<senderId>[^\]]+)\]" 
| rex field=_raw "queue: {\\\\\"SenderId\\\\\":(?<senderId>\d+)," 
| fields index senderId   
| stats dc(index) as foundboth by senderId
| where foundboth==2

CORRECTION - This method would only work if the field SenderId was an extracted field, which is apparently not the case.

Since you only are looking for items in the second search that are also in the first search, you could use a subsearch. In this case, either half could be put in braces as a subsearch to limit the other one's main search...

index="senderverification" Verification "DocumentType\\\":2"
    [index="docverificationengine" "Issuing country does not match WR records for Sender" 
    | rex field=_raw  "records for Sender \[(?P<senderId>[^\]]+)\]" '
    | table senderId]   
| rex field=_raw "queue: {\\\\\"SenderId\\\\\":(?<senderId>\d+)," 
| table senderId

The above method can be used as long as the number of results will not exceed a few hundred results.

Everything in braces will be returned to the outside search as ( ( senderId="value1" ) OR ( senderId="value2" ) ...) ... which will not work, since the field is not automatically search-time extracted. You could reformat the output to use it, but the alternate, stats version is preferable anyway, so this is left here as documentation only for anyone with similar issues where the field IS extracted.

View solution in original post

0 Karma

DalJeanis
Legend

There are several ways.

The following is the preferable, stats version, that has no limitations.

(index="docverificationengine" "Issuing country does not match WR records for Sender") 
OR (index="senderverification" Verification "DocumentType\\\":2" ) 
| rex field=_raw  "records for Sender \[(?P<senderId>[^\]]+)\]" 
| rex field=_raw "queue: {\\\\\"SenderId\\\\\":(?<senderId>\d+)," 
| fields index senderId   
| stats dc(index) as foundboth by senderId
| where foundboth==2

CORRECTION - This method would only work if the field SenderId was an extracted field, which is apparently not the case.

Since you only are looking for items in the second search that are also in the first search, you could use a subsearch. In this case, either half could be put in braces as a subsearch to limit the other one's main search...

index="senderverification" Verification "DocumentType\\\":2"
    [index="docverificationengine" "Issuing country does not match WR records for Sender" 
    | rex field=_raw  "records for Sender \[(?P<senderId>[^\]]+)\]" '
    | table senderId]   
| rex field=_raw "queue: {\\\\\"SenderId\\\\\":(?<senderId>\d+)," 
| table senderId

The above method can be used as long as the number of results will not exceed a few hundred results.

Everything in braces will be returned to the outside search as ( ( senderId="value1" ) OR ( senderId="value2" ) ...) ... which will not work, since the field is not automatically search-time extracted. You could reformat the output to use it, but the alternate, stats version is preferable anyway, so this is left here as documentation only for anyone with similar issues where the field IS extracted.

0 Karma

nadlurinadluri
Communicator

Did you test the run time using the append command?

0 Karma

Esperteyu
Explorer

@nadlurinadluri no, how would that be?

0 Karma

nadlurinadluri
Communicator

This is the query for append, Just replacing join senderID with append.. Again, I am not exactly how much of a difference this is gonna make.

index="docverificationengine" "Issuing country does not match WR records for Sender"
| rex field=_raw "records for Sender [(?P[^]]+)]"
| table senderId
| append
[ search index="senderverification" Verification "DocumentType\\":2"
| rex field=_raw "queue: {\\\"SenderId\\\":(?\d+),"
| table senderId]

0 Karma

Esperteyu
Explorer

I made the assumption that I should not change the regular expressions, and by running this append alternative (which looks quicker, like half of the other one the very few times I ran it) I get a different set of results

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I'll take a guess at it.

(index="docverificationengine" "Issuing country does not match WR records for Sender") OR (index="senderverification" Verification "DocumentType\\\":2")  | rex "records for Sender \[(?P<senderId>[^\]]+)\]" | rex "queue: {\\\\\"SenderId\\\\\":(?<senderId>\d+)," |  stats count by senderId | fields - count
---
If this reply helps you, Karma would be appreciated.
0 Karma

Esperteyu
Explorer

Thanks richgalloway, unfortunately looks like there's something wrong as returns many id's that for instance don't have documentType=2 in the index="senderverification".

Bear in mind that there will be id's that can appear there with other documentType and still get the same message in index="docverificationengine"), and looks to me that your query returns those (i.e. esentially returning the "docverification" index match) but not sure what the

| stats count by senderId | fields - count

is intended to do

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The last two clauses produce a list of unique senderId values.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...