Is it possible to dedup events before they are ind...

nijjie · ‎03-03-2017

Using

index=ets2  source="my_source" | eval id=_cd."|".index."|".splunk_server | transaction _raw maxspan=1s keepevicted=true mvlist=t | search eventcount>1 | eval delete_id=mvindex(id, 1, -1) | stats count by delete_id | fields - count

I have approx. 500,000 events in 24 hrs that are duplicates. I would like to dedup prior to indexing. Is this possible?

somesoni2 · ‎03-03-2017

I don't think Splunk can identify/remove duplicate during indexing. The options would to remove duplicate at the source which is generating the log or pre-process the log after is generated and before it's indexed.

hartfoml · ‎03-03-2017

interesting question as to why the system is writing duplicate logs or are the time stamps different on each of the logs. this could be a case where the system is writing the same log _id every time it finds it but with different time stamps. It's not like a machine to make a mistake but rather the programmer could tell the machine to write the logs in this unusual fashion.

Is it possible to dedup events before they are indexed?

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!