Getting Data In

Splunk shows duplicate events in search results when there are no duplicates in the source file.

wpreston
Motivator

When I run a search in Splunk, the results show some duplicate events. I have checked the source file and the events are not duplicated there, so I'm not sure why Splunk is showing duplicates. It's not duplicating every event in the index, and I'm not sure how many of the events it is duplicating, but I know that I have seen it for some (but not all) events of a certain class. It may be happening to others that I just haven't come across yet.

I can use dedup with some options to avoid displaying these in the search results, but that is more avoiding the problem than solving it. Is there a way to stop splunk from creating these duplicates in the index so that I don't have to use dedup with every search?

1 Solution

yannK
Splunk Employee
Splunk Employee

First Identify which events are duplicate.

  • verify of they are coming from the exact same host / source / sourcetype :

myduplicateevent | stats count values(host) values(source) values(sourcetype) values(index) by _raw | WHERE count>1

  • check the _indextime to see when each duplicate event was indexed :

myduplicateevent | convert ctime(_indextime) AS indextime | table _time indextime _raw

Maybe your log files are rotating and splunk is detecting the copy as a new log file to index.
please check if :

  • you are using the crcSalt option
  • check the rotation of your files, if no first lines are modified during the process.
  • symlinks, verify that the multiple symlinks are not pointing to the same file/folder

View solution in original post

sumituv
New Member

I am facing the same issue even if I am searching with specific file name.

I removed crcSalt= from input.conf but no result.,I am facing the same issue.Even if I search with specific file.

I removed crcSalt= from input.conf but no result.

0 Karma

gacerioni
Engager

If you are using "crcSalt=<SOURCE>" with rotated logs, this could also cause duplicates.
This happens because the rotated file may stay in the same directory with a different name.

Finally, if your monitor has some wildcards that can match with the name of the rotated files, you'll face a duplicate event.

0 Karma

yannK
Splunk Employee
Splunk Employee

First Identify which events are duplicate.

  • verify of they are coming from the exact same host / source / sourcetype :

myduplicateevent | stats count values(host) values(source) values(sourcetype) values(index) by _raw | WHERE count>1

  • check the _indextime to see when each duplicate event was indexed :

myduplicateevent | convert ctime(_indextime) AS indextime | table _time indextime _raw

Maybe your log files are rotating and splunk is detecting the copy as a new log file to index.
please check if :

  • you are using the crcSalt option
  • check the rotation of your files, if no first lines are modified during the process.
  • symlinks, verify that the multiple symlinks are not pointing to the same file/folder

wpreston
Motivator

This was indeed the problem. It looks like Splunk indexed some of my events twice, once at 8 am and once at 1 pm yesterday, I'll have to dig in to figure out why. I'm still a Splunk newbie so this was very helpful:)

0 Karma

Ayn
Legend

No, you're supposed to use dedup all the time.

...kidding 😉
This obviously is not the behaviour you should be seeing, but we need more information than just that you get duplicates. A normal instance of Splunk indexing 'normal' logs will not produce duplicates. You're seeing duplicates because you're not configuring Splunk correctly, or you're indexing logs that confuse Splunk in one way or another, or both. Please give us more details on what you are indexing and how you have set up Splunk.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...