Getting Data In

Problem with CSV file monitoring importing corrupted data: end-of-line not respected

gedworksplunk
Engager

Hi,

Using Splunk 6.5.1 with either directing monitoring and indexing and search on a single machine,
or using a dedicated forwarder feeding the indexer/search head machine.

I've setup a monitoring of a directory where some binary updates a CSV file all day long:
2017.07.06.jobs
That CSV file has 31 fields on each line like:

FIELDS: ID,PROJECT,USER,OSGROUP,DIR,ENV,TOOL,JOBNAME,PRIORITY,RESOURCES,SUBMITHOST,EXECHOST,SUBMITTIME,STARTTIME,ENDTIME

For the sourcetype, I'm using the built-in "csv" complemented with a TIMESTAMP_FIELDS = SUBMITTIME.

The data loaded in my index is corrupted: I am seeing that sometimes a line is only half-read, so only the first half of the fields is populated. But then, the second-half of the line is treated as a new line with the first half of the fields being populated with the second half of the fields: aka: I see some EXECHOST name values in the PROJECT field.

I cannot find any warning of interest in the splunkd.log file,
apart maybe from:
07-06-2017 11:36:53.585 -0700 INFO WatchedFile - Resetting fd to re-extract header.

Any ideas?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Since the 'Resetting fd' message is info-level, it's probably not a big deal, but you may want to try putting a FIELDS attribute in props.conf to see if it keeps Splunk from re-reading the header.

As for the partial events, I've seen that happen with multi-line events where the extra lines took a while to write. Adjusting the time_before_close setting usually helps with that. Hard to believe it would take 3 seconds for your app to write a single line, though.

---
If this reply helps you, Karma would be appreciated.
0 Karma

gedworksplunk
Engager

Hi, the FIELD_NAMES = in props.conf did fix that message in the splunkd.log.

I've also tried to increase the time_before_close up to 65, and I am still seeing corrupted lines being read.

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...