I have created a directory to store log files that I pull from a remote machine. I use a cronjob to pull every x minutes and that calls a script which rsyncs the files over. Splunk is configured to monitor this directory. Using auth.log as an example, Splunk will index that file as expected the first time it appears in the directory. After that, anytime the file changes, Splunk re-indexes the entire file. So if there were 500 events initially, and the cronjob runs and now there are 510 events in the file (10 additional from the last time), Splunk will show 1010 events.
I also discovered that I can trigger Splunk to reindex the entire file simply by using the touch command on the file, leaving everything else about the file intact.
What I don't understand is that the *nix app automatically monitors /var/log on the Splunk machine and that behaves as I would expect. Only new events are added as the file changes and using touch on any of the files does not cause that file to be completely reindexed.
I have tried using rsync in append mode. I have also tried using the atomic-rsync perl script which basically rsyncs files to a temporary directory and then after everything has transferred, does a rename operation over the old files. Nothing I have tried so far seems to work.
I am new to Splunk, so I have to assume I am doing something wrong, but I really need to figure out what that might be because having my events constantly being replicated in full is not good.
My inputs.conf for the directory in question is:
[monitor:///usr/local/splunk/etc/apps/unix/local/remote_logs/machine1]
disabled = false
followTail = 0
host =
host_segment = 9
index = os
I'm using Splunk version 4.1.5, build 85165 on FreeBSD 8.1-release
... View more