When I run a search in Splunk, the results show some duplicate events. I have checked the source file and the events are not duplicated there, so I'm not sure why Splunk is showing duplicates. It's not duplicating every event in the index, and I'm not sure how many of the events it is duplicating, but I know that I have seen it for some (but not all) events of a certain class. It may be happening to others that I just haven't come across yet.
I can use dedup with some options to avoid displaying these in the search results, but that is more avoiding the problem than solving it. Is there a way to stop splunk from creating these duplicates in the index so that I don't have to use dedup with every search?
First Identify which events are duplicate.
myduplicateevent | stats count values(host) values(source) values(sourcetype) values(index) by _raw | WHERE count>1
myduplicateevent | convert ctime(_indextime) AS indextime | table _time indextime _raw
Maybe your log files are rotating and splunk is detecting the copy as a new log file to index.
please check if :
I am facing the same issue even if I am searching with specific file name.
I removed crcSalt= from input.conf but no result.,I am facing the same issue.Even if I search with specific file.
I removed crcSalt= from input.conf but no result.
If you are using "crcSalt=<SOURCE>" with rotated logs, this could also cause duplicates.
This happens because the rotated file may stay in the same directory with a different name.
Finally, if your monitor has some wildcards that can match with the name of the rotated files, you'll face a duplicate event.
First Identify which events are duplicate.
myduplicateevent | stats count values(host) values(source) values(sourcetype) values(index) by _raw | WHERE count>1
myduplicateevent | convert ctime(_indextime) AS indextime | table _time indextime _raw
Maybe your log files are rotating and splunk is detecting the copy as a new log file to index.
please check if :
This was indeed the problem. It looks like Splunk indexed some of my events twice, once at 8 am and once at 1 pm yesterday, I'll have to dig in to figure out why. I'm still a Splunk newbie so this was very helpful:)
No, you're supposed to use dedup
all the time.
...kidding 😉
This obviously is not the behaviour you should be seeing, but we need more information than just that you get duplicates. A normal instance of Splunk indexing 'normal' logs will not produce duplicates. You're seeing duplicates because you're not configuring Splunk correctly, or you're indexing logs that confuse Splunk in one way or another, or both. Please give us more details on what you are indexing and how you have set up Splunk.