Hi,
Using Splunk 6.5.1 with either directing monitoring and indexing and search on a single machine,
or using a dedicated forwarder feeding the indexer/search head machine.
I've setup a monitoring of a directory where some binary updates a CSV file all day long:
2017.07.06.jobs
That CSV file has 31 fields on each line like:
For the sourcetype, I'm using the built-in "csv" complemented with a TIMESTAMP_FIELDS = SUBMITTIME.
The data loaded in my index is corrupted: I am seeing that sometimes a line is only half-read, so only the first half of the fields is populated. But then, the second-half of the line is treated as a new line with the first half of the fields being populated with the second half of the fields: aka: I see some EXECHOST name values in the PROJECT field.
I cannot find any warning of interest in the splunkd.log file,
apart maybe from:
07-06-2017 11:36:53.585 -0700 INFO WatchedFile - Resetting fd to re-extract header.
Any ideas?
Since the 'Resetting fd' message is info-level, it's probably not a big deal, but you may want to try putting a FIELDS
attribute in props.conf to see if it keeps Splunk from re-reading the header.
As for the partial events, I've seen that happen with multi-line events where the extra lines took a while to write. Adjusting the time_before_close
setting usually helps with that. Hard to believe it would take 3 seconds for your app to write a single line, though.
Hi, the FIELD_NAMES = in props.conf did fix that message in the splunkd.log.
I've also tried to increase the time_before_close up to 65, and I am still seeing corrupted lines being read.