Splunk Search

How do you find data/values in a lookup that do not exist in the logs?

AnmolKohli
Explorer

We have a lookup file that has a list of series stored in a field — TS_SERIES_ID. We want to find the count of series that don't exist in logs and we used the below query to achieve the same.

| inputlookup tss_usage_csv | table TS_SERIES_ID
| search NOT [search index=web_timeseries  | mvexpand SeriesUT.series{}  | fields SeriesUT.series{} |rename SeriesUT.series{} as TS_SERIES_ID] | stats distinct_count(TS_SERIES_ID)

Issue : The results are getting truncated because we cannot have more than 10K results from the subsearch (We need this value to be around 300K and maximum can be set to 10500 in limits.conf).

Can you please let us know if there is any other way to achieve this?

Thanks

Tags (1)
0 Karma

valiquet
Contributor

index=web_timeseries | mvexpand SeriesUT.series{} | fields SeriesUT.series{} |rename SeriesUT.series{} as TS_SERIES_ID]
| lookup tss_usage_csv TS_SERIES_ID
|where isnull(TS_SERIES_ID) or TS_SERIES_ID==""
| stats distinct_count(TS_SERIES_ID)

Don't use subsearch and inputlookup

0 Karma

woodcock
Esteemed Legend

Like this:

index=web_timeseries 
| stats first(_time) BY SeriesUT.series{}
| rename SeriesUT.series{} AS TS_SERIES_ID
| eval which="search"
| apendpipe [
|inputlookup tss_usage_csv
| table TS_SERIES_ID
| eval which="lookup" ]
| stats dc(which) AS whichCount values(which) AS whiches BY TS_SERIES_ID
| where whichCount==1 AND whiches=="lookup"

woodcock
Esteemed Legend

This will be faster and scale better than any of the other solutions that might work.

0 Karma

AnmolKohli
Explorer

How can we display the count of distinct series id instead of displaying all 3 columns?

0 Karma

FrankVl
Ultra Champion

Nice one. Just note the typo in apendpipe, you're missing a p there 🙂

0 Karma

osakachan
Communicator

Why not use natural usage of lookup? If you add a control value for a match, you can do it easy.

| inputlookup mylook
| eval flag="y"
| outputlookup mylook

Then in the search:

| amazing search...
| lookup mylook ... OUTPUT flag
| search NOT flag="*"

It will run in the indexer because the late search but you will have an output of values that are not in the lookup.

0 Karma

FrankVl
Ultra Champion

Try the search below. It simply appends the 2 lists of series together (so no expensive joins or restricted sub searches) and then counts the total occurrence of each TS_SERIES_ID, filtering for those serie IDs that came from the lookup and only occurred once in the combined list (ie they did not occur in the list from the logs).

| inputlookup tss_usage_csv 
| table TS_SERIES_ID
| eval source=lookup
| append [ 
        search index=web_timeseries | mvexpand SeriesUT.series{} | fields SeriesUT.series{} |rename SeriesUT.series{} as TS_SERIES_ID | eval source=logs ] 
| eventstats count by TS_SERIES_ID
| where source=lookup count=1
0 Karma

AnmolKohli
Explorer

Not working.Getting below error :-

Subsearch produced 50000 results,truncating to maxout 50000

0 Karma

FrankVl
Ultra Champion

Ok, my bad, apparently that limit also applies when doing append.

Is the lookup also that big? Otherwise, try swapping log search and inputlookup:

  index=web_timeseries | mvexpand SeriesUT.series{} | fields SeriesUT.series{} |rename SeriesUT.series{} as TS_SERIES_ID | eval source=logs
 | append [ 
         | inputlookup tss_usage_csv 
 | table TS_SERIES_ID
 | eval source=lookup ] 
 | eventstats count by TS_SERIES_ID
 | where source=lookup count=1

Does that log search return only unique series, or is that full of duplicates? Then you could also throw in a dedup (as you're only interested in the unique series anyway.

0 Karma

AnmolKohli
Explorer

The lookup has unique values.

Can we increase the limit for subsearch from existing 50K to 500K? Will that impact performance or affect other searches?

0 Karma

AnmolKohli
Explorer

Lookup has 571007 rows 😕

Any other way to achieve this?

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

HI @AnmolKohli

Can you please try this?

| inputlookup tss_usage_csv | table TS_SERIES_ID
| join type=left TS_SERIES_ID [ search index=web_timeseries | mvexpand SeriesUT.series{} | fields SeriesUT.series{} |rename SeriesUT.series{} as TS_SERIES_ID | eval temp=1 ] | where isnull(temp) | stats distinct_count(TS_SERIES_ID)
0 Karma

AnmolKohli
Explorer

Not working.Getting below error :-

Subsearch produced 50000 results,truncating to maxout 50000

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

HI @AnmolKohli

Can you please try below searches?

| inputlookup tss_usage_csv 
| table TS_SERIES_ID 
| eval temp1=1 
| append 
    [ search index=web_timeseries 
    | mvexpand SeriesUT.series{} 
    | fields SeriesUT.series{} 
    | rename SeriesUT.series{} as TS_SERIES_ID 
    | eval temp2=1 ] 
| stats values(temp1) as temp1 values(temp2) as temp2 by TS_SERIES_ID 
| where isnull(temp2) 
| stats count(TS_SERIES_ID) as count

AND

index=web_timeseries 
    | mvexpand SeriesUT.series{} 
    | fields SeriesUT.series{} 
    | rename SeriesUT.series{} as TS_SERIES_ID 
    | eval temp2=1
| append 
    [ | inputlookup tss_usage_csv 
| table TS_SERIES_ID 
| eval temp1=1  ] 
| stats values(temp1) as temp1 values(temp2) as temp2 by TS_SERIES_ID 
| where isnull(temp1) 
| stats count(TS_SERIES_ID) as count
0 Karma

AnmolKohli
Explorer

Same error 😞

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

Can you please provide Output from below searches?

index=web_timeseries 
| mvexpand SeriesUT.series{} 
| fields SeriesUT.series{} 
| rename SeriesUT.series{} as TS_SERIES_ID 
| stats distinct_count(TS_SERIES_ID) as count 

| inputlookup tss_usage_csv 
| table TS_SERIES_ID 
| stats distinct_count(TS_SERIES_ID) as count 
0 Karma

AnmolKohli
Explorer

Running the query for last 7 days :

Output of #1 - 215830

Output of #2 - 571007

Both values are correct. We just need to subtract them now (not sure how)

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

HI @AnmolKohli

Can you please try below searches?

index=web_timeseries 
| mvexpand SeriesUT.series{} 
| fields SeriesUT.series{} 
| rename SeriesUT.series{} as TS_SERIES_ID 
| eval temp2=1
| append 
[ 
    | inputlookup tss_usage_csv 
    | table TS_SERIES_ID 
    | eval a=1 | accum a | eval subset=a%50000 | stats values(TS_SERIES_ID) as TS_SERIES_ID by subset
    | eval temp1=1  
]
| mvexpand TS_SERIES_ID
| stats values(temp1) as temp1 values(temp2) as temp2 by TS_SERIES_ID 
| where isnull(temp1) 
| stats count(TS_SERIES_ID) as count
0 Karma

AnmolKohli
Explorer

Output is not correct.

0 Karma

AnmolKohli
Explorer

Any other way to achieve this?

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...