I have events with this structure: { id, version, event_type }
. The id
field corresponds to a device ID. I'm trying to find all the unique devices (i.e., unique id
s) that are still using version=V1 vs those that have upgraded to V2.
To find devices still using V1, I could do a search like index=my_index version="V1" | dedup id
. The problem is that will match devices that sent some events while running V1, but have since upgraded to V2.
So let's suppose for example I have devices a,b,c
. a
has only used V1. b
has used V1 and has since upgraded to V2. c
has only sent events while using V2. So the set of events might look like this:
{ id=a, version=V1, event_type='alice' }
{ id=a, version=V1, event_type='bob' }
{ id=b, version=V1, event_type='carl' }
{ id=b, version=V2, event_type='dora' }
{ id=c, version=V2, event_type='eve' }
{ id=c, version=V2, event_type='fred' }
I'm trying to write 2 searches, one that will just return a
(who has only used V1), and one which will return b
& c
(both of which have used V2).
index=my_index version="V2" | dedup id
works fine for the second case (and returns [b,c]
).
The similar index=my_index version="V1" | dedup id
for the first case returns [a,b]
and not just [a]
, so that's not the search I want.
Is there any way I can exclude b
from the search, by excluding any events where the id
field is also present in other events that have non-matching criteria? I.e. a search that dedups all events that match version=V1
(events 1,2,3), but then excludes b
because event 3 has id=b
, version=V2
?
I've seen some similar questions that do something like | search NOT [search ... ]
but I can't see how to make that work here, where I'm not just looking at events, but trying to compare two dedup'd lists.
updated one -
sourcetype=uniqueidd | rex field=_raw "version=(?<version_number>[^,]+)" | eventstats max(version_number) as current_version min(version_number) as older_version by id | where like(current_version,"%1.10%") OR like(current_version,"%1.11%") OR like(current_version,"%1.12%") AND current_version=older_version | table current_version older_version _raw
The 'version' field is actually a bit more complicated that I put in my example above. There are lots of different versions of the app, and devices may have sent multiple events from older versions before upgrading to newer versions.
I am trying to filter out any devices that have sent events from the newer versions.
The exact search terms I'm using for the versions are
older versions: `(version="1.10*" OR version="1.11*" OR version="1.12*")`
newer versions: `(version="1.13*" OR version="1.14*" OR version="1.15*")`
updated one -
sourcetype=uniqueidd | rex field=_raw "version=(?<version_number>[^,]+)" | eventstats max(version_number) as current_version min(version_number) as older_version by id | where like(current_version,"%1.10%") OR like(current_version,"%1.11%") OR like(current_version,"%1.12%") AND current_version=older_version | table current_version older_version _raw
Thanks, but that's not going to help in the case of my real data. Each device has sent many events (not just one at each level). So we could have had device B have sent 10 events at V1 and then 20 events at V2. I think this query would still pick out B.
Sorry if the question was misleading, I tried to make the example simpler so it was easier to explain.
Actually you explained the issue very clearly and nicely.
Ok, let's try this ...
If we use Uniq command, it will get us only "a"..
index=my_index id=a |
| uniq|table id
Hi SamDavies, may we know
1) you have only two versions or more than two.
2). Can you give us the sample version numbers please.
Thanks - I've added this to a comment on the question (I can't see how to edit the question, sorry).
1) There are more than two versions for each of the set of older versions and set of newer versions.
2) Included in a comment on the question, but for example
older versions: (version="1.10*" OR version="1.11*" OR version="1.12*")
newer versions: (version="1.13*" OR version="1.14*" OR version="1.15*")
try this -
sourcetype=uniqueidd | rex field=_raw "version=(?<version_number>[^,]+)" | eventstats max(version_number) as current_version min(version_number) as older_version by id | where like(current_version,"%1.10%") OR like(current_version,"%1.11%") OR like(current_version,"%1.12%") AND current_version=older_version | table current_version older_version _raw
Thanks - this works.
I've tweaked the search a bit - I don't think you need to capture 'older_version' or compare it to current_version - the condition that current_version (i.e. the most recent version) is in the set is enough.
Great to know that we found out the solution !!! as you are new member to this forum, you can accept this as an answer (and you can upvote as well 😉 )
Try this
*UPDATED*
index=my_index | rex field=version "(?<ver_no>\d+)" | eventstats max(ver_no) as current_version min(ver_no) as oldest_version by id | where current_version=1 AND current_version=oldest_version
Where does the "ver_no" value come from in this example? Should that be specified as the output from the rex
command somehow?
Try the edited version
Try this
index=my_index | rex field=version "(?<ver_no>\d+)" | convert auto(ver_no) | eventstats max(ver_no) as current_version by id | where NOT (current_version=version)
Thank-you - this approach works (it's similar to the one inventsekar has posted below also).
Using max() ... by id
to get the most recent version from that ID was the key thing - that's really handy thing to learn about, thanks!
See the edited answer.
Here's one solution, which is not as efficient as yours since it reads more data, but it should give the desired results.
index=my_index | dedup id | where version=V1 | ...
Thanks. Unfortunately when I try this query it returns no results.
What is Splunk operating on after the |dedup command? How does this pick out the right set of IDs?