The issue: a file that is being monitored was ingested again via batch. The back story is not critical. We know what happened and it shouldn't happen again. We now have duplicates in our index back to 03/16/2021. The user of this data wants the duplicates removed. I have looked at solutions to removing duplicates and with the amount of data involved would be very time consuming.
The user asks the question: can we remove all the data based on index, host, sourcetype and source and then reload the data?
My process would be (for each file being monitored)
1) Turn off monitoring off the file
2) Remove the matching data.
3) Turn back on monitoring of the file.
When monitoring is turned back on, will it ingest the entire file the first time it is updated?
I am open to other solutions to this as well.
Thank you!
... View more