Splunk Search

Splunk duplicating events every time file changes

bitbuck3t
New Member

I have created a directory to store log files that I pull from a remote machine. I use a cronjob to pull every x minutes and that calls a script which rsyncs the files over. Splunk is configured to monitor this directory. Using auth.log as an example, Splunk will index that file as expected the first time it appears in the directory. After that, anytime the file changes, Splunk re-indexes the entire file. So if there were 500 events initially, and the cronjob runs and now there are 510 events in the file (10 additional from the last time), Splunk will show 1010 events.

I also discovered that I can trigger Splunk to reindex the entire file simply by using the touch command on the file, leaving everything else about the file intact.

What I don't understand is that the *nix app automatically monitors /var/log on the Splunk machine and that behaves as I would expect. Only new events are added as the file changes and using touch on any of the files does not cause that file to be completely reindexed.

I have tried using rsync in append mode. I have also tried using the atomic-rsync perl script which basically rsyncs files to a temporary directory and then after everything has transferred, does a rename operation over the old files. Nothing I have tried so far seems to work.

I am new to Splunk, so I have to assume I am doing something wrong, but I really need to figure out what that might be because having my events constantly being replicated in full is not good.

My inputs.conf for the directory in question is:

[monitor:///usr/local/splunk/etc/apps/unix/local/remote_logs/machine1]
disabled = false
followTail = 0
host = 
host_segment = 9
index = os

I'm using Splunk version 4.1.5, build 85165 on FreeBSD 8.1-release

Tags (1)
0 Karma
1 Solution

amrit
Splunk Employee
Splunk Employee

Based on this:

I can trigger Splunk to reindex the entire file simply by using the touch command on the file

it sounds like you've enabled CHECK_METHOD=modtime in props.conf. This is a setting that abandons normal file tracking (for the specified files) and instead reindexes them in their entirety when the modtime changes.

If you don't think you've done so, can you list which sourcetypes splunk is indexing these files as?

View solution in original post

amrit
Splunk Employee
Splunk Employee

Based on this:

I can trigger Splunk to reindex the entire file simply by using the touch command on the file

it sounds like you've enabled CHECK_METHOD=modtime in props.conf. This is a setting that abandons normal file tracking (for the specified files) and instead reindexes them in their entirety when the modtime changes.

If you don't think you've done so, can you list which sourcetypes splunk is indexing these files as?

bitbuck3t
New Member

Thank you! I didn't explicitly have that set, however I do remember that Splunk was detecting this as a config_file, which uses modtime. I had overridden the sourcetype in the local props.conf to syslog, but that apparently doesn't inherit the CHECK_METHOD of syslog files. I explicitly set to endpoint_md5 and it works now, only indexing new events.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...