Getting Data In

Searching most recent events with the same _time

phoenixdigital
Builder

Ok we are currently receiving two sets of data a preliminary version (received first) and a finalised version (received later). Both sets of data are identical and have the same _time values after import into the same sourcetype.

When performing calculations we only want to get the most recent value for that time.

Prelim data

UID, In Date, Update Time, Vol, Corr Vol
453,May 1 2012 6:00AM,May 2 2012 3:24PM,133,223.000000000
453,May 1 2012 7:00AM,May 2 2012 3:24PM,104,175.000000000
453,May 1 2012 8:00AM,May 2 2012 3:24PM,90,152.000000000

Final data

UID, In Date, Update Time, Vol, Corr Vol
453,May 1 2012 6:00AM,May 2 2012 3:24PM,140,223.000000000
453,May 1 2012 7:00AM,May 2 2012 3:24PM,110,175.000000000
453,May 1 2012 8:00AM,May 2 2012 3:24PM,93,152.000000000

Now I know I can use the search and it will get the most recent version

sourcetype="Flow" UID=452 | dedup _time

Now while this works it is undocumented and we would hate for such a 'feature' to be changed and then break the Splunk app we are developing.

Can someone confirm this is the only way to achieve this or is there a better way?

Tags (1)
0 Karma
1 Solution

Ayn
Legend

What is undocumented? dedup _time? While I guess that PARTICULAR usage example for dedup might not be explicitly stated in the docs, both the dedup command and the _time field are definitely not going anywhere soon.

But, I don't know if there's any guarantee that given two events with identical timestamp, Splunk is going to choose the newest one. I would consider differentiating the events using the field it would check anyway to see which event is newer - _indextime, which is what it says...a field containing the time (in epoch format) when Splunk indexed an event.

View solution in original post

0 Karma

Ayn
Legend

What is undocumented? dedup _time? While I guess that PARTICULAR usage example for dedup might not be explicitly stated in the docs, both the dedup command and the _time field are definitely not going anywhere soon.

But, I don't know if there's any guarantee that given two events with identical timestamp, Splunk is going to choose the newest one. I would consider differentiating the events using the field it would check anyway to see which event is newer - _indextime, which is what it says...a field containing the time (in epoch format) when Splunk indexed an event.

0 Karma

phoenixdigital
Builder

Thankyou _indextime would be perfect.

I wasn't thinking dedup was undocumented or would go away but more that the way it behaved with _time might change. That was the undocumented part I was referring to.

sourcetype="Flow" UID = 453 | dedup _time sortby -_indextime

will give consistent results.

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...