Getting Data In

Splunk data duplicated when reading offset decreases

samhughe
Path Finder

We have a forwarder monitoring a log file and are seeing duplicated data indexed from that file (by a number of indexers within the autoLB group)

I'm seeing the following in the splunkd.log file on the forwarder:

splunkd.log:02-28-2013 13:42:34.044 +0000 INFO  WatchedFile - Will begin reading at offset=13142179 for file='<filename removed>'.
splunkd.log:02-28-2013 13:43:34.043 +0000 INFO  WatchedFile - Will begin reading at offset=13161047 for file='<filename removed>'.
splunkd.log:02-28-2013 13:43:44.092 +0000 INFO  WatchedFile - Will begin reading at offset=13138930 for file='<filename removed>'.
splunkd.log:02-28-2013 13:49:44.297 +0000 INFO  WatchedFile - Will begin reading at offset=13274923 for file='<filename removed>'.
splunkd.log:02-28-2013 13:50:34.333 +0000 INFO  WatchedFile - Will begin reading at offset=13329736 for file='<filename removed>'.
splunkd.log:02-28-2013 13:50:54.349 +0000 INFO  WatchedFile - Will begin reading at offset=13281747 for file='<filename removed>'.
splunkd.log:02-28-2013 13:51:04.367 +0000 INFO  WatchedFile - Will begin reading at offset=13281747 for file='<filename removed>'.
splunkd.log:02-28-2013 13:54:14.523 +0000 INFO  WatchedFile - Will begin reading at offset=13320589 for file='<filename removed>'.

As you can see the offset position for where to start reading the file decreases occasionally.

Any suggestions as to what the issue may be? (I know our indexers are a bit overloaded at present but I'm not seeing many failed ACKs in the log file)

Edit:
inputs.conf:

[monitor://<path to file removed>/access_vap*.log]
sourcetype = jboss-access-proxy

outputs.conf

[tcpout]
defaultGroup = default-autolb-group
disabled = false
maxQueueSize = 6MB

[tcpout:default-autolb-group]
autoLB = true
disabled = false
server = <servernames>
useACK = true
Tags (2)
0 Karma

Kate_Lawrence-G
Contributor

while not strictly necessary you may want to add
followTail = 1
to your inputs.conf to ignore the older data which may be causing the issue.

from the doc:

followTail =1
Can be used to force splunk to skip past all current data for a given stanza.
* In more detail: this is intended to mean that if you start up splunk with a
stanza configured this way, all data in the file at the time it is first
encountered will not be read. Only data arriving after that first
encounter time will be read.
* This can be used to "skip over" data from old log files, or old portions of
log files, to get started on current data right away

0 Karma

Kate_Lawrence-G
Contributor

could you provide the details of how the input is configured?

0 Karma

samhughe
Path Finder

Thanks, details added to the original post

0 Karma
Get Updates on the Splunk Community!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...

Splunk Lantern is Splunk’s customer success center that provides advice from Splunk experts on valuable data ...