Ok we are currently receiving two sets of data a preliminary version (received first) and a finalised version (received later). Both sets of data are identical and have the same _time values after import into the same sourcetype.
When performing calculations we only want to get the most recent value for that time.
Prelim data
UID, In Date, Update Time, Vol, Corr Vol
453,May 1 2012 6:00AM,May 2 2012 3:24PM,133,223.000000000
453,May 1 2012 7:00AM,May 2 2012 3:24PM,104,175.000000000
453,May 1 2012 8:00AM,May 2 2012 3:24PM,90,152.000000000
Final data
UID, In Date, Update Time, Vol, Corr Vol
453,May 1 2012 6:00AM,May 2 2012 3:24PM,140,223.000000000
453,May 1 2012 7:00AM,May 2 2012 3:24PM,110,175.000000000
453,May 1 2012 8:00AM,May 2 2012 3:24PM,93,152.000000000
Now I know I can use the search and it will get the most recent version
sourcetype="Flow" UID=452 | dedup _time
Now while this works it is undocumented and we would hate for such a 'feature' to be changed and then break the Splunk app we are developing.
Can someone confirm this is the only way to achieve this or is there a better way?
What is undocumented? dedup _time
? While I guess that PARTICULAR usage example for dedup
might not be explicitly stated in the docs, both the dedup
command and the _time
field are definitely not going anywhere soon.
But, I don't know if there's any guarantee that given two events with identical timestamp, Splunk is going to choose the newest one. I would consider differentiating the events using the field it would check anyway to see which event is newer - _indextime
, which is what it says...a field containing the time (in epoch format) when Splunk indexed an event.
What is undocumented? dedup _time
? While I guess that PARTICULAR usage example for dedup
might not be explicitly stated in the docs, both the dedup
command and the _time
field are definitely not going anywhere soon.
But, I don't know if there's any guarantee that given two events with identical timestamp, Splunk is going to choose the newest one. I would consider differentiating the events using the field it would check anyway to see which event is newer - _indextime
, which is what it says...a field containing the time (in epoch format) when Splunk indexed an event.
Thankyou _indextime would be perfect.
I wasn't thinking dedup was undocumented or would go away but more that the way it behaved with _time might change. That was the undocumented part I was referring to.
sourcetype="Flow" UID = 453 | dedup _time sortby -_indextime
will give consistent results.