Splunk Search

How to collect and dedup the newest to a new Index

sai33
Explorer

Hello Splunkers,

I've got an existing index which I would like to process and collect in a new Index. My rough idea is as following:

  • Use Sort and get the latest(Newest) event in the existing Index - BY(Group by) ID
  • Collect(Copy) only the first(Newest) event from the Above Index to a New Index.

My sample data in the existing Index looks like below:

ID, Action, DateTime
1, Purchase, 11.08.2019-16:00
1, Purchase, 11.08.2019-15:30
2, Purchase, 11.08.2019-13:00
3, Purchase, 11.08.2019-16:00

The new data in my New Index should be a Collect from the Above Index

ID, Action, DateTime
1, Purchase, 11.08.2019-16:00
2, Purchase, 11.08.2019-13:00
3, Purchase, 11.08.2019-16:00

If you observe the second Event for ID 1 is not present in the second Index.

I'm believing this should be possible using Sort, Dedup and Collect. Please suggest the best possible method. I've to move an Index of around 5GB.

Thanks!!

0 Karma

niketn
Legend

@sai33 does the DateTime field in index1 corresponds to _time field in your data?

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

sai33
Explorer

I'm not exactly sure what _time you're refering to. But, this is the timestamp(Date & Time of the Event)
Since, being a newbie to Splunk i'm relatively new to technical terms.
Sorry for the trouble!

0 Karma

niketn
Legend

In order for the community to assist you better you would need to provide your current SPL (mock/anonymize any sensitive information before posting the same).

Can you print the following table and see if _time has same value as DateTime or not?

<yourIndex1Query>
| table _time ID Action DateTime

_time is the Time of the event that you define while indexing the data in Splunk. It is one of the most crucial piece of information that Splunk would need while indexing as any incorrect timestamp in indexed event would imply that none of your correlation/queries would work as expected.

While this is not directly related to the answer to your question here, I would recommend you to understand this as the first step for indexing data correctly. So, refer to documentation: https://docs.splunk.com/Documentation/Splunk/latest/Data/HowSplunkextractstimestamps
Also, second most crucial step is Event Breaking which tells Splunk the boundary of each event as it processes streaming data input. Incorrect event breaks would imply that there may be unwanted events overlap or drop. So read the following documentation as well: https://docs.splunk.com/Documentation/Splunk/latest/Data/Configureeventlinebreaking

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...