Solved: Why are files in a monitored directory being skipp...

demondo · ‎04-23-2015

Hi all,

I am using the directory monitoring feature to index files below a specific path. The stanza in inputs.conf looks like this:

[monitor://E:\Logs\UTC]
disabled = false
host_regex = \.?(?<host>[A-Za-z_]*)_[0-9]
sourcetype = tsv

Looking at the Splunk data though, I occasionally see that files placed in to that directory do not get indexed. I can manually index each of these files using the oneshot CLI command, but was hoping to figure out why they were skipped in the first place. Has anyone seen this before?

Any assistance would be appreciated.

tom_frotscher · ‎04-27-2015

Hi,

this is a common question, often seen here.

Most of the time, this is caused by files with large headers. I see your input is a tsv, so this can also apply to your problem.

Splunk uses a hash value to determine if a file has already been indexed. To calculate the hashvalue, the first few lines or signs (how much to read is configurable) of a file are read and the hash is calculated. If your file has a large header, there is the posibility that the hash is equal for several files.

You have two options:

1) Expand the amount of lines or signs splunk reads to calculate the hash.
Details here, you have to search for "initCrcLength".

2) Add a salt to the read lines / signs.

Similar problem in answers.
Or in the Docs, you have to search for "initCrcLength".

Greetings

Tom

View solution in original post

tom_frotscher · ‎04-27-2015

Hi,

this is a common question, often seen here.

Most of the time, this is caused by files with large headers. I see your input is a tsv, so this can also apply to your problem.

Splunk uses a hash value to determine if a file has already been indexed. To calculate the hashvalue, the first few lines or signs (how much to read is configurable) of a file are read and the hash is calculated. If your file has a large header, there is the posibility that the hash is equal for several files.

You have two options:

1) Expand the amount of lines or signs splunk reads to calculate the hash.
Details here, you have to search for "initCrcLength".

2) Add a salt to the read lines / signs.

Similar problem in answers.
Or in the Docs, you have to search for "initCrcLength".

Greetings

Tom

demondo · ‎05-03-2015

Thanks Tom,

Your theory about the header is probably right. I ended up fixing the issue by setting

crcSalt =

In inputs.conf. It did result in my files being double indexed after resetting the Splunk server. That was a bit of a pain to resolve, but once I did the issue appears to have been fixed. Thanks for the tip!

Best,
Rob Rolnick

Why are files in a monitored directory being skipped?

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!