Solved: Is there a way to avoid the join command between i...

Esperteyu · ‎08-24-2018

I'm trying to get my head around the alternatives, but can't see how I could get rid of the join in the following query:

 index="docverificationengine" "Issuing country does not match WR records for Sender" | rex field=_raw  "records for Sender \[(?P<senderId>[^\]]+)\]" | table senderId   | join senderId [ search index="senderverification" Verification "DocumentType\\\":2" | rex field=_raw "queue: {\\\\\"SenderId\\\\\":(?<senderId>\d+)," | table senderId]

I have to admit though, that I don't have a clear concept of what would be a good performance. It takes around 4.5 seconds to run with a set of less than 2k in the "docverificationengine" index but over 300k in the "senderverification" one

DalJeanis · ‎08-29-2018

There are several ways.

The following is the preferable, stats version, that has no limitations.

(index="docverificationengine" "Issuing country does not match WR records for Sender") 
OR (index="senderverification" Verification "DocumentType\\\":2" ) 
| rex field=_raw  "records for Sender \[(?P<senderId>[^\]]+)\]" 
| rex field=_raw "queue: {\\\\\"SenderId\\\\\":(?<senderId>\d+)," 
| fields index senderId   
| stats dc(index) as foundboth by senderId
| where foundboth==2

CORRECTION - This method would only work if the field SenderId was an extracted field, which is apparently not the case.

Since you only are looking for items in the second search that are also in the first search, you could use a subsearch. In this case, either half could be put in braces as a subsearch to limit the other one's main search...

index="senderverification" Verification "DocumentType\\\":2"
    [index="docverificationengine" "Issuing country does not match WR records for Sender" 
    | rex field=_raw  "records for Sender \[(?P<senderId>[^\]]+)\]" '
    | table senderId]   
| rex field=_raw "queue: {\\\\\"SenderId\\\\\":(?<senderId>\d+)," 
| table senderId

The above method can be used as long as the number of results will not exceed a few hundred results.

Everything in braces will be returned to the outside search as ( ( senderId="value1" ) OR ( senderId="value2" ) ...) ... which will not work, since the field is not automatically search-time extracted. You could reformat the output to use it, but the alternate, stats version is preferable anyway, so this is left here as documentation only for anyone with similar issues where the field IS extracted.

View solution in original post

DalJeanis · ‎08-29-2018

There are several ways.

The following is the preferable, stats version, that has no limitations.

(index="docverificationengine" "Issuing country does not match WR records for Sender") 
OR (index="senderverification" Verification "DocumentType\\\":2" ) 
| rex field=_raw  "records for Sender \[(?P<senderId>[^\]]+)\]" 
| rex field=_raw "queue: {\\\\\"SenderId\\\\\":(?<senderId>\d+)," 
| fields index senderId   
| stats dc(index) as foundboth by senderId
| where foundboth==2

CORRECTION - This method would only work if the field SenderId was an extracted field, which is apparently not the case.

Since you only are looking for items in the second search that are also in the first search, you could use a subsearch. In this case, either half could be put in braces as a subsearch to limit the other one's main search...

index="senderverification" Verification "DocumentType\\\":2"
    [index="docverificationengine" "Issuing country does not match WR records for Sender" 
    | rex field=_raw  "records for Sender \[(?P<senderId>[^\]]+)\]" '
    | table senderId]   
| rex field=_raw "queue: {\\\\\"SenderId\\\\\":(?<senderId>\d+)," 
| table senderId

The above method can be used as long as the number of results will not exceed a few hundred results.

Everything in braces will be returned to the outside search as ( ( senderId="value1" ) OR ( senderId="value2" ) ...) ... which will not work, since the field is not automatically search-time extracted. You could reformat the output to use it, but the alternate, stats version is preferable anyway, so this is left here as documentation only for anyone with similar issues where the field IS extracted.

nadlurinadluri · ‎08-25-2018

Did you test the run time using the append command?

Esperteyu · ‎08-25-2018

@nadlurinadluri no, how would that be?

nadlurinadluri · ‎08-27-2018

This is the query for append, Just replacing join senderID with append.. Again, I am not exactly how much of a difference this is gonna make.

index="docverificationengine" "Issuing country does not match WR records for Sender"
| rex field=_raw "records for Sender [(?P[^]]+)]"
| table senderId
| append
[ search index="senderverification" Verification "DocumentType\\":2"
| rex field=_raw "queue: {\\\"SenderId\\\":(?\d+),"
| table senderId]

Esperteyu · ‎08-29-2018

I made the assumption that I should not change the regular expressions, and by running this append alternative (which looks quicker, like half of the other one the very few times I ran it) I get a different set of results

richgalloway · ‎08-24-2018

I'll take a guess at it.

(index="docverificationengine" "Issuing country does not match WR records for Sender") OR (index="senderverification" Verification "DocumentType\\\":2")  | rex "records for Sender \[(?P<senderId>[^\]]+)\]" | rex "queue: {\\\\\"SenderId\\\\\":(?<senderId>\d+)," |  stats count by senderId | fields - count

---
If this reply helps you, Karma would be appreciated.

Esperteyu · ‎08-24-2018

Thanks richgalloway, unfortunately looks like there's something wrong as returns many id's that for instance don't have documentType=2 in the index="senderverification".

Bear in mind that there will be id's that can appear there with other documentType and still get the same message in index="docverificationengine"), and looks to me that your query returns those (i.e. esentially returning the "docverification" index match) but not sure what the

| stats count by senderId | fields - count

is intended to do

richgalloway · ‎08-27-2018

The last two clauses produce a list of unique senderId values.

---
If this reply helps you, Karma would be appreciated.

Is there a way to avoid the join command between indexes?

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!