Splunk Search

How to remove duplicate events in INDEX , not on Search ?

jadengoho
Builder

I do have many data including duplicate data , and i want to remove duplicate data from the index , without using the ""DEDUP" command since it only remove the event on SEARCH not in INDEX , can somebody help me ?

Tags (1)
0 Karma

niketn
Legend

@jadengoho, are these duplicates old or your data will keep on having duplicate data in future as well? If there will be duplicates, what is the source/cause/frequency of duplicate data?

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

jadengoho
Builder

it is a daily logs data , so duplicate data is a problem , cause they are just stacking .

0 Karma

niketn
Legend

If you can fix data while ingestion that would be best. Else you can run a daily scheduled search (to run after data is ingested), which will list all daily data with dedup and push it to separate index.

Refer to Splunk Documentation: https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Collect#Moving_events_to_a_diffe...

PS:
You can use collect command to do this, however, to me seems overhead unless fixed prior to indexing.
You can also think of scripted input to do this in case there are no other means of preventing duplicated events from being indexed.
Using collect command if you define sourcetype other than stash, it will count against your license.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

mjlsnombrado
Communicator

I have the same problem, do I need to use a script to fix this issue? If yes, what kind of script should I use?

0 Karma

nickhills
Ultra Champion

You will need to create a search which finds your duplicated data, and returns all but the last copy (or first - depending on your needs).
Once you are happy your search correctly identifies ONLY the duplicated events you can pipe the results to |delete which will remove the data from the indexes.

You will need to be a user with 'can delete' permissions - no user has this be default (not even admin) so you may need to add this capability to your user first - its also a good idea to remove this capability when you have finished to prevent accidents! (been there)

Its worth noting that this will not remove the data from disk - it simply marks it as deleted in the buckets, so it wont be returned in future searches

If my comment helps, please give it a thumbs up!
0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...