Getting Data In

Duplicate indexing of data

soumdey
Path Finder

I have situation in hand here...

I have a abc.txt file in server1 which I am monitoring using a forwarder.

The abc.txt file updates every 1 hour in such a way that, the content of the whole file is cleared and the same content is written back to the abc.txt file.

The issue in hand is, splunk is indexing the data from abc.txt everytime the content is removed and written back to the abc.txt file, which is resulting in duplication of the data multiple times.

can somebody please help me in rectifying the issue..?? do i need to change the crcinitlength value..??

Tags (2)
0 Karma
1 Solution

soumdey
Path Finder

After some R&D, i could figure out what was causing the issue and how to fix it.

The issue was with how the script was writing the data in the output file from where Splunk was forwarding the data.
The script was configured in such a way that it would erase the existing data from the file and write the existing data + the new data in the file.
This made Splunk to believe that it was a new data and index the same data all over again.
So basically, the number of times the script was running, it would create that many duplicates.

To fix the issue, what we did is, instead of rewriting the complete data all over again and again in the file, only the new data was written into the file which avoids any duplication whatsoever.

Hope I made it clear for everyone following the question.

View solution in original post

0 Karma

arunsunny
Path Finder

@soumdey - The great thing is you reduced the duplicate usage of Splunk License.

0 Karma

soumdey
Path Finder

After some R&D, i could figure out what was causing the issue and how to fix it.

The issue was with how the script was writing the data in the output file from where Splunk was forwarding the data.
The script was configured in such a way that it would erase the existing data from the file and write the existing data + the new data in the file.
This made Splunk to believe that it was a new data and index the same data all over again.
So basically, the number of times the script was running, it would create that many duplicates.

To fix the issue, what we did is, instead of rewriting the complete data all over again and again in the file, only the new data was written into the file which avoids any duplication whatsoever.

Hope I made it clear for everyone following the question.

0 Karma

soumdey
Path Finder

Can somebody please help me out here...???

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...