Getting Data In

Why are files in a monitored directory being skipped?

demondo
Engager

Hi all,

I am using the directory monitoring feature to index files below a specific path. The stanza in inputs.conf looks like this:

[monitor://E:\Logs\UTC]
disabled = false
host_regex = \.?(?<host>[A-Za-z_]*)_[0-9]
sourcetype = tsv

Looking at the Splunk data though, I occasionally see that files placed in to that directory do not get indexed. I can manually index each of these files using the oneshot CLI command, but was hoping to figure out why they were skipped in the first place. Has anyone seen this before?

Any assistance would be appreciated.

Tags (2)
1 Solution

tom_frotscher
Builder

Hi,

this is a common question, often seen here.

Most of the time, this is caused by files with large headers. I see your input is a tsv, so this can also apply to your problem.

Splunk uses a hash value to determine if a file has already been indexed. To calculate the hashvalue, the first few lines or signs (how much to read is configurable) of a file are read and the hash is calculated. If your file has a large header, there is the posibility that the hash is equal for several files.

You have two options:

1) Expand the amount of lines or signs splunk reads to calculate the hash.
Details here, you have to search for "initCrcLength".

2) Add a salt to the read lines / signs.

Similar problem in answers.
Or in the Docs, you have to search for "initCrcLength".

Greetings

Tom

View solution in original post

tom_frotscher
Builder

Hi,

this is a common question, often seen here.

Most of the time, this is caused by files with large headers. I see your input is a tsv, so this can also apply to your problem.

Splunk uses a hash value to determine if a file has already been indexed. To calculate the hashvalue, the first few lines or signs (how much to read is configurable) of a file are read and the hash is calculated. If your file has a large header, there is the posibility that the hash is equal for several files.

You have two options:

1) Expand the amount of lines or signs splunk reads to calculate the hash.
Details here, you have to search for "initCrcLength".

2) Add a salt to the read lines / signs.

Similar problem in answers.
Or in the Docs, you have to search for "initCrcLength".

Greetings

Tom

demondo
Engager

Thanks Tom,

Your theory about the header is probably right. I ended up fixing the issue by setting

crcSalt =

In inputs.conf. It did result in my files being double indexed after resetting the Splunk server. That was a bit of a pain to resolve, but once I did the issue appears to have been fixed. Thanks for the tip!

Best,
Rob Rolnick

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...