I have been trying to understand when it is best practice to use PREAMBLE_REGEX, FIELD_HEADER_REGEX, and/or HEADER_FIELD_LINE_NUMBER when indexing files with headers. I couldn't find in the documentation answers to some of the following questions:
For example, I'm trying to parse the following sample output from TZWorks..
usp - full ver: 0.52; Copyright (c) TZWorks LLC
License #-------------- is authenticated for business use and registered to --------------
run time: -------------- [UTC]; Host: -------------
"cmdline: C:\--------------\usp64.exe -csvl2t -partition C:"
note: When comparing timestamps from manual analysis use option [-show_other_times] to see full range of timestamps recovered
date,time,timezone,MACB,source,sourcetype,type,user,host,short,desc,version,filename,inode,notes,format,extra
$sampledata...
I set up the following lines in props.conf (among other settings):
[usp]
PREAMBLE_REGEX = ^(usp|License|run|\"cmdline|\s*$)
FIELD_HEADER_REGEX = ^date
HEADER_FIELD_LINE_NUMBER = 7
These settings seem to work as long as the event files are consistent with the sample above. However, when no events are found, neither the header field ("date,time,timezone... etc.") nor the $sampledata exists, and Splunk interprets the first 5 lines as an actual event when indexing. Is there a better way to approach this in general that might also help solve my issue when the file does not contain events?
The docs say the FIELD_HEADER_REGEX value is not included in the headers so your current setting shouldn't work. That it does work tells me that field is trumped by one of the other two.