Splunk Search

How can I find unique IDs that match one search but not another?

samdavies
Engager

I have events with this structure: { id, version, event_type }. The id field corresponds to a device ID. I'm trying to find all the unique devices (i.e., unique ids) that are still using version=V1 vs those that have upgraded to V2.

To find devices still using V1, I could do a search like index=my_index version="V1" | dedup id. The problem is that will match devices that sent some events while running V1, but have since upgraded to V2.

So let's suppose for example I have devices a,b,c. a has only used V1. b has used V1 and has since upgraded to V2. c has only sent events while using V2. So the set of events might look like this:

  1. { id=a, version=V1, event_type='alice' }
  2. { id=a, version=V1, event_type='bob' }
  3. { id=b, version=V1, event_type='carl' }
  4. { id=b, version=V2, event_type='dora' }
  5. { id=c, version=V2, event_type='eve' }
  6. { id=c, version=V2, event_type='fred' }

I'm trying to write 2 searches, one that will just return a (who has only used V1), and one which will return b & c (both of which have used V2).

index=my_index version="V2" | dedup id works fine for the second case (and returns [b,c]).
The similar index=my_index version="V1" | dedup id for the first case returns [a,b] and not just [a], so that's not the search I want.

Is there any way I can exclude b from the search, by excluding any events where the id field is also present in other events that have non-matching criteria? I.e. a search that dedups all events that match version=V1 (events 1,2,3), but then excludes b because event 3 has id=b, version=V2?

I've seen some similar questions that do something like | search NOT [search ... ] but I can't see how to make that work here, where I'm not just looking at events, but trying to compare two dedup'd lists.

Tags (3)
0 Karma
1 Solution

inventsekar
Ultra Champion

updated one -

sourcetype=uniqueidd | rex field=_raw "version=(?<version_number>[^,]+)" | eventstats max(version_number) as current_version min(version_number) as older_version by id | where like(current_version,"%1.10%") OR like(current_version,"%1.11%") OR like(current_version,"%1.12%") AND current_version=older_version | table current_version older_version _raw

View solution in original post

samdavies
Engager

The 'version' field is actually a bit more complicated that I put in my example above. There are lots of different versions of the app, and devices may have sent multiple events from older versions before upgrading to newer versions.

I am trying to filter out any devices that have sent events from the newer versions.

The exact search terms I'm using for the versions are

older versions: `(version="1.10*" OR version="1.11*" OR version="1.12*")`
newer versions: `(version="1.13*" OR version="1.14*" OR version="1.15*")`
0 Karma

inventsekar
Ultra Champion

updated one -

sourcetype=uniqueidd | rex field=_raw "version=(?<version_number>[^,]+)" | eventstats max(version_number) as current_version min(version_number) as older_version by id | where like(current_version,"%1.10%") OR like(current_version,"%1.11%") OR like(current_version,"%1.12%") AND current_version=older_version | table current_version older_version _raw

samdavies
Engager

Thanks, but that's not going to help in the case of my real data. Each device has sent many events (not just one at each level). So we could have had device B have sent 10 events at V1 and then 20 events at V2. I think this query would still pick out B.

Sorry if the question was misleading, I tried to make the example simpler so it was easier to explain.

0 Karma

inventsekar
Ultra Champion

Actually you explained the issue very clearly and nicely.

Ok, let's try this ...
If we use Uniq command, it will get us only "a"..

index=my_index id=a | 
 | uniq|table id
0 Karma

inventsekar
Ultra Champion

Hi SamDavies, may we know
1) you have only two versions or more than two.
2). Can you give us the sample version numbers please.

0 Karma

samdavies
Engager

Thanks - I've added this to a comment on the question (I can't see how to edit the question, sorry).

1) There are more than two versions for each of the set of older versions and set of newer versions.
2) Included in a comment on the question, but for example

older versions: (version="1.10*" OR version="1.11*" OR version="1.12*")
newer versions: (version="1.13*" OR version="1.14*" OR version="1.15*")
0 Karma

inventsekar
Ultra Champion

try this -

sourcetype=uniqueidd | rex field=_raw "version=(?<version_number>[^,]+)" | eventstats max(version_number) as current_version min(version_number) as older_version by id | where like(current_version,"%1.10%") OR like(current_version,"%1.11%") OR like(current_version,"%1.12%") AND current_version=older_version | table current_version older_version _raw
0 Karma

samdavies
Engager

Thanks - this works.

I've tweaked the search a bit - I don't think you need to capture 'older_version' or compare it to current_version - the condition that current_version (i.e. the most recent version) is in the set is enough.

0 Karma

inventsekar
Ultra Champion

Great to know that we found out the solution !!! as you are new member to this forum, you can accept this as an answer (and you can upvote as well 😉 )

0 Karma

sundareshr
Legend

Try this

*UPDATED*

index=my_index | rex field=version "(?<ver_no>\d+)" | eventstats max(ver_no) as current_version min(ver_no) as oldest_version by id | where current_version=1 AND current_version=oldest_version 

samdavies
Engager

Where does the "ver_no" value come from in this example? Should that be specified as the output from the rex command somehow?

0 Karma

sundareshr
Legend

Try the edited version

0 Karma

sundareshr
Legend

Try this

index=my_index | rex field=version "(?<ver_no>\d+)" | convert auto(ver_no) | eventstats max(ver_no) as current_version by id | where NOT (current_version=version)
0 Karma

samdavies
Engager

Thank-you - this approach works (it's similar to the one inventsekar has posted below also).

Using max() ... by id to get the most recent version from that ID was the key thing - that's really handy thing to learn about, thanks!

0 Karma

richgalloway
SplunkTrust
SplunkTrust

See the edited answer.

---
If this reply helps you, Karma would be appreciated.
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Here's one solution, which is not as efficient as yours since it reads more data, but it should give the desired results.

index=my_index | dedup id | where version=V1 | ...
---
If this reply helps you, Karma would be appreciated.
0 Karma

samdavies
Engager

Thanks. Unfortunately when I try this query it returns no results.

What is Splunk operating on after the |dedup command? How does this pick out the right set of IDs?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...