How do I find rows in a table with more fields, wh...

aovsiannikov · ‎12-20-2018

I.e.

<search1>: ... | table id, f1, f2, f3
<search2>: ... | table id, f1, f2

I need to find all records in <search1> that are not equal to any record in <search2>

If I do something like

<search1> | search NOT [<search2>]

Splunk will not consider record, let's say <record1>: id=someID, f1=1, f2=2, f3=3 in <search1> different from record <record2>: id=someID, f1=1, f2=2 from <search2> , because field f3 will not be presented in the boolean expression generated by the subsearch. So it will not pass <record1> to final recordset, but I need it there.

So what is the best approach there?

aovsiannikov · ‎12-21-2018

Guys, thanks a lot for replies, I'd like to experiment a little bit before accepting. But I'd like to avoid to relay on exact fieldset in search 1 and 2, I'd like to have some unified approach. Means, I'd like to avoid using f3 explicitly in search. Let's imagine that fieldset for search 1 is defined in some variable, I don't know exact fieldset, it's configurable, I don't want to parse it. So the spl should be independent from what is in configuration. Also I have no _raw field, both search are came from lookup.

Now all I can imagine is to search index1 twice, something like this:
index=index1 | search NOT [index=index2 | search [index=index1]]

gcusello · ‎12-20-2018

Hi aovsiannikov,
if results in search2 are less than 50,000 you can use a subsearch like the following

index=index1 NOT [ search index=index2 | fields id f1 f2 f3]
| ...

where id, f1, f2, f3 are the fields to compare results from the searches.
If you want to compare the full record you can use as row the field _raw.

If instead results in search2 are more than 50,000, you have to use a different approach, something like this:

index=index1 OR index=index2
| stats count BY index id f1 f2 f3
| where index=index1 AND count=1

if instead you want to compare the full raw you can run something like this:

index=index1 OR index=index2
| stats count BY index _raw
| where index=index1 AND count=1

Bye.
Giuseppe

FrankVl · ‎12-21-2018

Assuming that id is actually an identifier of a record, it could be as simple as this, right?

search1 OR search2
| evenstats count by id
| search f3=* count=1

(the f3=* is to return the rows belonging to search1. if f3 is not always populated for those items, then use some other characterizing property of the items that came from search1).

If you also need to take f1 and f2 into account for matching items between the two data sets:

search1 OR search2
| evenstats count by id,f1,f2
| search f3=* count=1

briancronrath · ‎12-20-2018

Assuming these tables aren't over 50k rows (default max of a join subsearch) you could join the results together, and do some eval tricks to compare the data and filter it out. So something like:

 <search1>: ... | table id, f1, f2, f3 | join type=left id [ <search2>: ... | table id, f1, f2 | rename f1 as f1_x | rename f2 as f2_x] | eval filter_out=if(f1=f1_x AND f2=f2_x,"filter","keep") | search filter_out=keep

How do I find rows in a table with more fields, which are NOT EQUAL to any row in table with less fields?

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms