How do you find data/values in a lookup that do no...

AnmolKohli · ‎11-18-2018

We have a lookup file that has a list of series stored in a field — TS_SERIES_ID. We want to find the count of series that don't exist in logs and we used the below query to achieve the same.

| inputlookup tss_usage_csv | table TS_SERIES_ID
| search NOT [search index=web_timeseries  | mvexpand SeriesUT.series{}  | fields SeriesUT.series{} |rename SeriesUT.series{} as TS_SERIES_ID] | stats distinct_count(TS_SERIES_ID)

Issue : The results are getting truncated because we cannot have more than 10K results from the subsearch (We need this value to be around 300K and maximum can be set to 10500 in limits.conf).

Can you please let us know if there is any other way to achieve this?

Thanks

valiquet · ‎12-21-2018

index=web_timeseries | mvexpand SeriesUT.series{} | fields SeriesUT.series{} |rename SeriesUT.series{} as TS_SERIES_ID]
| lookup tss_usage_csv TS_SERIES_ID
|where isnull(TS_SERIES_ID) or TS_SERIES_ID==""
| stats distinct_count(TS_SERIES_ID)

Don't use subsearch and inputlookup

woodcock · ‎11-21-2018

Like this:

index=web_timeseries 
| stats first(_time) BY SeriesUT.series{}
| rename SeriesUT.series{} AS TS_SERIES_ID
| eval which="search"
| apendpipe [
|inputlookup tss_usage_csv
| table TS_SERIES_ID
| eval which="lookup" ]
| stats dc(which) AS whichCount values(which) AS whiches BY TS_SERIES_ID
| where whichCount==1 AND whiches=="lookup"

woodcock · ‎11-21-2018

This will be faster and scale better than any of the other solutions that might work.

AnmolKohli · ‎11-22-2018

How can we display the count of distinct series id instead of displaying all 3 columns?

FrankVl · ‎11-21-2018

Nice one. Just note the typo in apendpipe, you're missing a p there 🙂

osakachan · ‎11-19-2018

Why not use natural usage of lookup? If you add a control value for a match, you can do it easy.

| inputlookup mylook
| eval flag="y"
| outputlookup mylook

Then in the search:

| amazing search...
| lookup mylook ... OUTPUT flag
| search NOT flag="*"

It will run in the indexer because the late search but you will have an output of values that are not in the lookup.

FrankVl · ‎11-19-2018

Try the search below. It simply appends the 2 lists of series together (so no expensive joins or restricted sub searches) and then counts the total occurrence of each TS_SERIES_ID, filtering for those serie IDs that came from the lookup and only occurred once in the combined list (ie they did not occur in the list from the logs).

| inputlookup tss_usage_csv 
| table TS_SERIES_ID
| eval source=lookup
| append [ 
        search index=web_timeseries | mvexpand SeriesUT.series{} | fields SeriesUT.series{} |rename SeriesUT.series{} as TS_SERIES_ID | eval source=logs ] 
| eventstats count by TS_SERIES_ID
| where source=lookup count=1

AnmolKohli · ‎11-19-2018

Not working.Getting below error :-

Subsearch produced 50000 results,truncating to maxout 50000

FrankVl · ‎11-19-2018

Ok, my bad, apparently that limit also applies when doing append.

Is the lookup also that big? Otherwise, try swapping log search and inputlookup:

  index=web_timeseries | mvexpand SeriesUT.series{} | fields SeriesUT.series{} |rename SeriesUT.series{} as TS_SERIES_ID | eval source=logs
 | append [ 
         | inputlookup tss_usage_csv 
 | table TS_SERIES_ID
 | eval source=lookup ] 
 | eventstats count by TS_SERIES_ID
 | where source=lookup count=1

Does that log search return only unique series, or is that full of duplicates? Then you could also throw in a dedup (as you're only interested in the unique series anyway.

AnmolKohli · ‎11-19-2018

The lookup has unique values.

Can we increase the limit for subsearch from existing 50K to 500K? Will that impact performance or affect other searches?

AnmolKohli · ‎11-19-2018

Lookup has 571007 rows 😕

Any other way to achieve this?

kamlesh_vaghela · ‎11-19-2018

HI @AnmolKohli

Can you please try this?

| inputlookup tss_usage_csv | table TS_SERIES_ID
| join type=left TS_SERIES_ID [ search index=web_timeseries | mvexpand SeriesUT.series{} | fields SeriesUT.series{} |rename SeriesUT.series{} as TS_SERIES_ID | eval temp=1 ] | where isnull(temp) | stats distinct_count(TS_SERIES_ID)

AnmolKohli · ‎11-19-2018

Not working.Getting below error :-

Subsearch produced 50000 results,truncating to maxout 50000

kamlesh_vaghela · ‎11-19-2018

HI @AnmolKohli

Can you please try below searches?

| inputlookup tss_usage_csv 
| table TS_SERIES_ID 
| eval temp1=1 
| append 
    [ search index=web_timeseries 
    | mvexpand SeriesUT.series{} 
    | fields SeriesUT.series{} 
    | rename SeriesUT.series{} as TS_SERIES_ID 
    | eval temp2=1 ] 
| stats values(temp1) as temp1 values(temp2) as temp2 by TS_SERIES_ID 
| where isnull(temp2) 
| stats count(TS_SERIES_ID) as count

AND

index=web_timeseries 
    | mvexpand SeriesUT.series{} 
    | fields SeriesUT.series{} 
    | rename SeriesUT.series{} as TS_SERIES_ID 
    | eval temp2=1
| append 
    [ | inputlookup tss_usage_csv 
| table TS_SERIES_ID 
| eval temp1=1  ] 
| stats values(temp1) as temp1 values(temp2) as temp2 by TS_SERIES_ID 
| where isnull(temp1) 
| stats count(TS_SERIES_ID) as count

AnmolKohli · ‎11-19-2018

Same error 😞

kamlesh_vaghela · ‎11-19-2018

Can you please provide Output from below searches?

index=web_timeseries 
| mvexpand SeriesUT.series{} 
| fields SeriesUT.series{} 
| rename SeriesUT.series{} as TS_SERIES_ID 
| stats distinct_count(TS_SERIES_ID) as count 

| inputlookup tss_usage_csv 
| table TS_SERIES_ID 
| stats distinct_count(TS_SERIES_ID) as count

AnmolKohli · ‎11-19-2018

Running the query for last 7 days :

Output of #1 - 215830

Output of #2 - 571007

Both values are correct. We just need to subtract them now (not sure how)

kamlesh_vaghela · ‎11-19-2018

HI @AnmolKohli

Can you please try below searches?

index=web_timeseries 
| mvexpand SeriesUT.series{} 
| fields SeriesUT.series{} 
| rename SeriesUT.series{} as TS_SERIES_ID 
| eval temp2=1
| append 
[ 
    | inputlookup tss_usage_csv 
    | table TS_SERIES_ID 
    | eval a=1 | accum a | eval subset=a%50000 | stats values(TS_SERIES_ID) as TS_SERIES_ID by subset
    | eval temp1=1  
]
| mvexpand TS_SERIES_ID
| stats values(temp1) as temp1 values(temp2) as temp2 by TS_SERIES_ID 
| where isnull(temp1) 
| stats count(TS_SERIES_ID) as count

AnmolKohli · ‎11-19-2018

Output is not correct.

AnmolKohli · ‎11-19-2018

Any other way to achieve this?

How do you find data/values in a lookup that do not exist in the logs?

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Adoption of RUM and APM at Splunk