Splunk Search

How to search for values not present in the lookup table

akarunkumar321
Engager

Hi,

I Have a table-1 with tracking IDs ex: 123, 456, 789 and the other query which returns a table-2 with tracking ID's ex: 456, 789.
Now, I need a query which gives me a table-3 with the values which are not present in table-2 when compared with the table -1. I tried something like this.

 source=service1.log  earliest=-4h latest=now() | rex field=_raw "trackingId\":\s\"(?[\w-]+)\""  | table ProducerTrackingID |  search NOT [search source=service2.log  earliest=-4h latest=now() | rex field=_raw "trackingId\":\s\"(?[\w-]+)\"" | table ConsumerTrackingID]

Can you please explain me the query. And Thanks a lot in advance.

0 Karma
1 Solution

VatsalJagani
SplunkTrust
SplunkTrust

Hi @raj_mpl,
Could you please try below splunk "set" command.

| set diff [<first-query>] [<second-query>]

Here is one small example.

| set diff [| makeresults | eval ID="10" | append [| makeresults | eval ID="15"] | table ID] [| makeresults | eval ID="10" | table ID]

View solution in original post

0 Karma

woodcock
Esteemed Legend

See my answers here for background:
https://answers.splunk.com/answers/567851/how-can-i-compare-mvfields-and-get-a-diff.html
https://answers.splunk.com/answers/734599/how-to-compare-the-same-search-from-the-previous-d.html

Start with this to create 2 fields with your data:

index=YouShouldAlwaysSpecifyAnIndex AND (source=service1.log  OR source=service2.log) earliest=-4h latest=now()
| rex field=_raw "trackingId\":\s\"(?<trackingId>[\w-]+)\"" 
| eval ProducerTrackingID = if(source=="service1.log, trackingId, null())
| eval ConsumerTrackingID = if(source=="service1.log, null() trackingId)
| stats values(*TrackingID) AS *TrackingID

For run anywhere, try this:

| makeresults 
| eval ProducerTrackingID="123 456 789", ConsumerTrackingID="456 789" 
| makemv ProducerTrackingID 
| makemv ConsumerTrackingID

Then you can EITHER do this:

| streamstats count AS _serial 
| multireport 
    [| mvexpand ProducerTrackingID
    | where ConsumerTrackingID!=ProducerTrackingID
    | rename ProducerTrackingID AS ProducerTrackingID_only] 
    [| mvexpand ConsumerTrackingID
    | where ConsumerTrackingID!=ProducerTrackingID
    | rename ConsumerTrackingID AS ConsumerTrackingID_only] 
| stats values(*) AS * BY _serial

OR this:

| nomv ConsumerTrackingID
| nomv ProducerTrackingID
| rex field=ConsumerTrackingID mode=sed "s/[\r\n\s]+/;/g"
| rex field=ProducerTrackingID mode=sed "s/[\r\n\s]+/;/g"
| eval setdiff = split(replace(replace(replace(replace(mvjoin(mvsort(mvappend(split(replace(ConsumerTrackingID, "(;|$)", "#1;"), ";"), split(replace(ProducerTrackingID, "(;|$)", "#0;"), ";"))), ";"), ";(\w+)#0\;\1#1", ""), ";\w+#1", ""), "#0", ""), ";(?!\w)|^;", ""), ";")
0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

Hi @raj_mpl,
Could you please try below splunk "set" command.

| set diff [<first-query>] [<second-query>]

Here is one small example.

| set diff [| makeresults | eval ID="10" | append [| makeresults | eval ID="15"] | table ID] [| makeresults | eval ID="10" | table ID]
0 Karma

raj_mpl
Path Finder

@akarunkumar321 , You can try the below query and let me know

index=ccp source=service1.log earliest=-4h latest=now() | rex field=_raw "trackingId\":\s\"(?[\w-]+)\"" | table ProducerTrackingID |join type=outer ProducerTrackingID [search index=service2.log earliest=-4h latest=now() | rex field=_raw "trackingId\":\s\"(?[\w-]+)\""|rename ConsumerTrackingID as ProducerTrackingID]|search NOT source=service1.log
0 Karma

akarunkumar321
Engager

This one giving me all the values in the table ProducerTrackingID. Before that, in the subquery search, we are just renaming the ConsumerTrackingID to be ProducerTrackingID. Do we don't require a table to hold these values?

0 Karma

Vijeta
Influencer

Where is the lookup table in your query, do you want the results based on lookup or index?

0 Karma

akarunkumar321
Engager

index=ccp source=service1.log earliest=-4h latest=now() | rex field=_raw "trackingId\":\s\"(?[\w-]+)\"" | table ProducerTrackingID | search NOT [search index=service2.log earliest=-4h latest=now() | rex field=_raw "trackingId\":\s\"(?[\w-]+)\"" | table ConsumerTrackingID]. The lookuptable is ConsumerTrackingID. I want the values which are not present in the ConsumerTrackingID Table.

0 Karma

Vijeta
Influencer

@akarunkumar321 I am still not clear on the question. The current query you have written, searches on 2 different indexes , I believe your subsearch says index=service2.log which looks like should be source=service2.log and index=ccp.

0 Karma

akarunkumar321
Engager

Sorry for the confusion both of them has the same index

0 Karma

Vijeta
Influencer

so is this query working fine? do you need the tracking id from this query to be compared against lookup table ?

0 Karma

akarunkumar321
Engager

So I have a producer and consumer tables which will be having the same trackingId's every time. If a tracking Id is been missed in the consumer table. But it is present in the Producer table. Then I need those values of the trackingIDs which are missed.

0 Karma

Vijeta
Influencer

Your query itself should work. Are you getting the tracking I’d valued correctly with your red command , have you tested it. What is the output you are getting with your query ? Also when you run individual query do you get tracking id values ?

0 Karma

akarunkumar321
Engager

That worked out. Thanks a lot. can we see producerTrackingId's and CosnumerTrackingId's in a different table as well?

0 Karma

Vijeta
Influencer

You already have them in 2 different indexes and you are doing this query to get the third list based on 2 indexes . You can store results in a lookup if you want but what is the use case to store it ?

0 Karma

akarunkumar321
Engager

Wanted to check out the data which we are sending and which is been received as well

0 Karma

akarunkumar321
Engager

That worked out. Thanks a lot. can we see producerTrackingId's and CosnumerTrackingId's in the different table as well?

0 Karma

akarunkumar321
Engager

No, it is not working, When I run the main query and subquery separately they both give me the tables with the values but whereas if I run it together in the subquery it always returns me the producer table with the trackingId values.

0 Karma

Vijeta
Influencer

Try this

source=service1.log earliest=-4h latest=now() | rex field=_raw "trackingId\":\s\"(?[\w-]+)\"" | table ProducerTrackingID | search NOT [search source=service2.log earliest=-4h latest=now() | rex field=_raw "trackingId\":\s\"(?[\w-]+)\"" | table ConsumerTrackingID| rename ConsumerTrackingID as ProducerTrackingID]
0 Karma

akarunkumar321
Engager

Ok, I tried out the query which you have suggested. If you look into the image on the top left corner which I have shared you it shows me 5 events. Basically, it should be zero because I ran the query separately and all the tracking Id from the producer are present in the consumer service.

0 Karma

raj_mpl
Path Finder

@akarunkumar321 can you try splunk joins here

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...