Re: File with Header not getting indexed

Parameshwara · ‎03-30-2014

[test_header]
INDEXED_EXTRACTIONS = CSV
HEADER_FIELD_LINE_NUMBER = 1
KV_MODE = none
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
pulldown_type = 1
TRANSFORMS-NoHeader = test_header

First file gets indexed accordingly with only the data captured and header ignored, but subsequent files are not indexed at all.

Parameshwara · ‎04-01-2014

At the moment I'm not using crcSalt setting, as mentioned I don't want any possibility of logs being re-indexed.

My working configuration...

PROPS.CONF:
[host::testcsvwithheader]
CHECK_METHOD = entire_md5
HEADER_FIELD_LINE_NUMBER = 1
INDEXED_EXTRACTIONS = CSV
KV_MODE = none
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
pulldown_type = 1
REPORT-AutoHeader = skipheader

INPUTS.CONF
[monitor:///...]
disabled = false
followTail = 0
host = testcsvwithheader
index = test
sourcetype = testcsvwithheader
initCrcLength = 654

Parameshwara · ‎03-31-2014

I'll test out the suggested configuration.

I installed a new instance of Splunk 6.02 on my laptop, created a test app and using the same configurations tried pulling in data for indexing the same set of files. It WORKED! My header is 433 characters. I'm a bit stumped, but feel like this is a bug.

Parameshwara · ‎03-31-2014

[monitor:...]
disabled = false
followTail = 0
host = testheader
index = testheader
sourcetype = testheader

Above is my inputs.conf. I'll check out the "CHECK_METHOD = entire_md5" option, and thanks for pointing out the correct stanza it works with.

marcoscala · ‎03-31-2014

I had a similar problem due to the first 260 chars in the file being alway the same due to long headers.

I solved this in the inputs.conf like this:

[monitor:///........./appdir/SD*.ERR_*.Z]
disabled = false
followTail = 0
sourcetype = my_sourcetype
initCrcLength = 330
crcSalt = <SOURCE>

In my case, we had thousands of file being written in the same "appdir" and severa times the "ERR" files were skipped because of same headers.

Marco

Parameshwara · ‎04-01-2014

Read about crcSalt option and decided not to use that. Thanks.

miteshvohra · ‎03-31-2014

Using "checkMethod" and "initCrcLength" is better than using "crcSalt". Be cautious about using attribute with rolling log files; it could lead to the log file being re-indexed after it has rolled over and in turn, consume your indexing license as well.

Parameshwara · ‎03-31-2014

I'll test out the suggested configuration.

I installed a new instance of Splunk 6.02 on my laptop, created a test app and using the same configurations tried pulling in data for indexing the same set of files. It WORKED! My header is 433 characters. I'm a bit stumped, but feel like this is a bug.

marcoscala · ‎03-31-2014

beware that this option is valid only for a stanza like [source::filename]

miteshvohra · ‎03-31-2014

Add "CHECK_METHOD = entire_md5" to props.conf file and retry.

Splunk, by default, check the first and last 256 bytes of the file. When it's finds matches, Splunk lists the file as already indexed and indexes only new data, or ignores it if there is no new data.

http://docs.splunk.com/Documentation/Splunk/6.0.2/admin/Propsconf

kristian_kolb · ‎03-31-2014

what does your inputs.conf look like?

File with Header not getting indexed

Detecting Remote Code Executions With the Splunk Threat Research Team

Observability | Use Synthetic Monitoring for Website Metadata Verification

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk