Solved: IIS log file data duplication - "Checksum for seek...

mParticle · ‎12-02-2013

I have a base install of 1 indexer and a few UFs. Both the indexer and UFs are version 6.0, build 182037 (UFs are Windows 2012, indexer is on Ubuntu).

In the UF's .\etc\system\local\inputs.conf I have a basic stanza:

[monitor://C:\inetpub\logs\LogFiles\W3SVC1]
sourcetype = iis
index = iis_logs
disabled = false

After making the change above and restarting the UF, it starts reading the IIS logs, then logs this entry:

12-02-2013 11:54:39.390 -0500 INFO  WatchedFile - Checksum for seekptr didn't match, will re-read entire file='C:\inetpub\logs\LogFiles\W3SVC1\u_ex131202.log'.
12-02-2013 11:54:39.390 -0500 INFO  WatchedFile - Will begin reading at offset=0 for file='C:\inetpub\logs\LogFiles\W3SVC1\u_ex131202.log'.
12-02-2013 11:54:39.437 -0500 INFO  WatchedFile - Resetting fd  to re-extract header.

and then a couple of minutes later, the above 3 lines repeat... then again, and again, duplicating data, using up the indexing quota and chewing through disk space. I am not the only person with this issue, as it seems from a quick search through the answers - here is one. I tried the workaround in this post and it worked, but since Splunk 6.0 changed the way IIS logs are handled (see this product announcement), I thought I'd try to use the new way, instead of hacking it to make it work and (probably) eventually break something when this gets fixed.

Does anyone have any suggestions? An official fix maybe?

Thanks in advance!

ekost · ‎12-18-2013

There is an issue that causes duplicate IIS events to appear when using a new feature in Splunk 6.0. The Answers post here: discusses the issue.

View solution in original post

ekost · ‎12-18-2013

There is an issue that causes duplicate IIS events to appear when using a new feature in Splunk 6.0. The Answers post here: discusses the issue.

ekost · ‎12-18-2013

No, not yet. The core issue is still being investigated. A workaround has been identified for use with version 6 forwarders and is being validated.

mParticle · ‎12-18-2013

Thanks. Do you have any details on when it is coming out?

buckeye07 · ‎12-03-2013

I have the same issue, but the logs are for other W3C formats and the log files are much larger, so the impact is greater for me. No answers yet, but will report back if I figure something out.

stephanyespence · ‎12-16-2013

Us too. Anyone have a solution yet??

bruceclarke · ‎12-16-2013

Did you ever find a solution to this? We're running into the same issue, and it's causing us to forward GIGAbytes of data from what should only be 10MB daily.

ShaneNewman · ‎12-02-2013

I think everyone has this issue. It is actually there to make sure that if the logs roll over and keeps the same name, it doesn't see the file as a duplicate. Chances are, if you see this, those files are extremely small and shouldn't really impact your indexing license unless you have hundreds of thousands of files that single UF is monitoring (not going to happen).

We have hundreds of files being monitored via a Heavy Forwarder and we get this message anytime it is restarted. Unless we clean the fish bucket before we start the instance, the most I have seen re-indexed from a restart is about 15MB.

ShaneNewman · ‎12-03-2013

I agree, if it is re-indexing files of that size, that is a bug.

mParticle · ‎12-03-2013

Well, the files are small, but not THAT small - 5-10 MB each. I know Splunk has functionality to calculate a hash of a file so it knows if the file rolled over, but these file do NOT roll over, they are appended to, and this behavior was different (read "working properly") prior to version 6, so I'd say this is a bug. Also, it only takes a minute or two to ingest a 10 MB file, so multiply this by 24 hours, and then by a few web servers and you'll quickly see it can become an issue for someone who doesn't have a large Splunk license.

Bottom line, I think this is a bug and should be addressed.

IIS log file data duplication - "Checksum for seekptr didn't match, will re-read entire file"

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms