Splunk Dev

How do I deduplicate events with such conditions?

szabados
Communicator

So I got multiple custom datasources, scripts mainly, which are sending events to Splunk on some schedule/recurrence.
I can distinguish every execution of these sources by either a timestamp, or a custom ID, which gets incremented with every execution which is captured in every event. The events always have a proper host field, which also contributes to the "unique key" of an event with unique ID mentioned beforehand. The hosts are attributed with custom fields, this is the third part of something which could be used as uniqe key. These are always present in the events as long as they apply to a given host, and are no longer present when they don't apply to a host.

An example what I mean (every line is a separate event):

  • hostID=host1, attributeID=attribute1, customid=customid1
  • hostID=host1, attributeID=attribute2, customid=customid1
  • hostID=host2, attributeID=attribute1, customid=customid2
  • hostID=host1, attributeID=attribute1, customid=customid2

(Because of the _time field, these would appear in Splunk in reverse order obviously)

I want to deduplicate such events to always have the data only from the really last execution of a script. Like, from the above example, I want to have only

  • host2, attribute1, customid2
  • host1, attribute1, customid2

If I were to use

| dedup hostID, attributeID, customid

It would yield me
- host1, attribute2, customid1
- host2, attribute1, customid2
- host1, attribute1, customid2

The solution my team came up is using

<base search> | eventstats max(customid) as max_customid by hostID | search customid=max_customid

This pretty much does the thing, but I feel this is really not efficient - what would be the right approach do to this?

===EDIT

One given host has multiple events (with multiple attributes) from the same execution of the script.
A more detailed example, let's say I got these events:

  • hostID=host1, attributeID=attribute1, customid=customid1
  • hostID=host1, attributeID=attribute2, customid=customid1
  • hostID=host2, attributeID=attribute1, customid=customid2
  • hostID=host1, attributeID=attribute1, customid=customid2
  • hostID=host1, attributeID=attribute3, customid=customid2
  • hostID=host1, attributeID=attribute4, customid=customid2
  • hostID=host2, attributeID=attribute3, customid=customid2

I want to keep the below events:

  • hostID=host2, attributeID=attribute1, customid=customid2
  • hostID=host1, attributeID=attribute1, customid=customid2
  • hostID=host1, attributeID=attribute3, customid=customid2
  • hostID=host1, attributeID=attribute4, customid=customid2
  • hostID=host2, attributeID=attribute3, customid=customid2

This is the reason I can't use stats first()

0 Karma

woodcock
Esteemed Legend

Let's baseline. These stats pairs are similar: first/last, earliest/latest, min/max. The last pair I think are obvious but the first pair are not the same as the second pair, which is what may people assume at first. If your events have not been resorted, they should (and this is a big "should" because sometimes Splunk fails to do this and doesn't always generate a warning) come back to you sorted in "newest to latest" order with newest on top. In such a case, first does the same thing as latest. Let that sink in: first DOES NOT do the same thing as earliest; it does the OPPOSITE. That is because what first actually does is walk backwards through your events from the top (which by default should be the "latest" event) and grab the "first" one that it sees.

OK, so for your case, simply sort your events the way that you desire (you can have multiple layers of sort by using more than 1 field argument) and then use first or dedup.

Pro tip: be sure that you use sort 0, not just sort.

0 Karma

somesoni2
SplunkTrust
SplunkTrust

How about you just do dedup on host??

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

Have you tried the "first" function with the stats command: <base search> | eval myKey=attributeID.customID | stats first(myKey) by hostID

0 Karma

szabados
Communicator

Unfortunately not what I need, please see me update on the original post above.

0 Karma

s2_splunk
Splunk Employee
Splunk Employee
<base search> | eval myKey=hostID.attributeID.customID | dedup myKey

Should do what you want. Dedup keeps the youngest event that matches the combined key.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...