Splunk data duplicated when reading offset decreas...

samhughe · ‎02-28-2013

We have a forwarder monitoring a log file and are seeing duplicated data indexed from that file (by a number of indexers within the autoLB group)

I'm seeing the following in the splunkd.log file on the forwarder:

splunkd.log:02-28-2013 13:42:34.044 +0000 INFO  WatchedFile - Will begin reading at offset=13142179 for file='<filename removed>'.
splunkd.log:02-28-2013 13:43:34.043 +0000 INFO  WatchedFile - Will begin reading at offset=13161047 for file='<filename removed>'.
splunkd.log:02-28-2013 13:43:44.092 +0000 INFO  WatchedFile - Will begin reading at offset=13138930 for file='<filename removed>'.
splunkd.log:02-28-2013 13:49:44.297 +0000 INFO  WatchedFile - Will begin reading at offset=13274923 for file='<filename removed>'.
splunkd.log:02-28-2013 13:50:34.333 +0000 INFO  WatchedFile - Will begin reading at offset=13329736 for file='<filename removed>'.
splunkd.log:02-28-2013 13:50:54.349 +0000 INFO  WatchedFile - Will begin reading at offset=13281747 for file='<filename removed>'.
splunkd.log:02-28-2013 13:51:04.367 +0000 INFO  WatchedFile - Will begin reading at offset=13281747 for file='<filename removed>'.
splunkd.log:02-28-2013 13:54:14.523 +0000 INFO  WatchedFile - Will begin reading at offset=13320589 for file='<filename removed>'.

As you can see the offset position for where to start reading the file decreases occasionally.

Any suggestions as to what the issue may be? (I know our indexers are a bit overloaded at present but I'm not seeing many failed ACKs in the log file)

Edit:
inputs.conf:

[monitor://<path to file removed>/access_vap*.log]
sourcetype = jboss-access-proxy

outputs.conf

[tcpout]
defaultGroup = default-autolb-group
disabled = false
maxQueueSize = 6MB

[tcpout:default-autolb-group]
autoLB = true
disabled = false
server = <servernames>
useACK = true

Kate_Lawrence-G · ‎03-01-2013

while not strictly necessary you may want to add
followTail = 1
to your inputs.conf to ignore the older data which may be causing the issue.

from the doc:

followTail =1
Can be used to force splunk to skip past all current data for a given stanza.
* In more detail: this is intended to mean that if you start up splunk with a
stanza configured this way, all data in the file at the time it is first
encountered will not be read. Only data arriving after that first
encounter time will be read.
* This can be used to "skip over" data from old log files, or old portions of
log files, to get started on current data right away

Kate_Lawrence-G · ‎02-28-2013

could you provide the details of how the input is configured?

samhughe · ‎02-28-2013

Thanks, details added to the original post

Splunk data duplicated when reading offset decreases

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...