Solved: Why is my sourcetype not parsing as CSV and am get...

gbronner_rbc · ‎09-28-2015

I'm trying to parse a CSV file, but I'm getting two events: one with a header and one with a raw event. It is driving me nuts. I've tried deleting and reloading the data multiple times. The file has 2 lines, so at least it is small.

The file is being loaded via a CLI:

splunk add oneshot <filename> -sourcetype backtestMetaData -index grb_test

On my server, props.conf in ./etc/apps/<app_name>/local/props.conf
I've looked for 'backtest' in other props.conf files, but don't see any. Nothing special on the forwarder.

[ backtestMetaData]
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = startTime
category = Structured
description = format for csv from testREsutls.csv
disabled = false
pulldown_type = true

[source::.../testResults.csv]
sourcetype=backtestMetaData

gbronner_rbc · ‎09-30-2015

It appears that when loading data from a universal forwarder when the data is header structured (e.g. CSV, TSV), one must edit props.conf on the forwarder in order to tell splunk that the sourcetype has to be handled differently.

Example is:

[backtestMetaData]
INDEXED_EXTRACTIONS = csv
NO_BINARY_CHECK = 1

However, the settings will be slightly different from the props.conf on the indexer, which may need to apply custom timestamp rules.

I found this document to be a a very useful explanation of the process;
https://wiki.splunk.com/Community:HowIndexingWorks

This question was also useful:
http://answers.splunk.com/answers/153488/does-a-universal-forwarder-ever-read-props-conf.html

View solution in original post

gbronner_rbc · ‎09-30-2015

It appears that when loading data from a universal forwarder when the data is header structured (e.g. CSV, TSV), one must edit props.conf on the forwarder in order to tell splunk that the sourcetype has to be handled differently.

Example is:

[backtestMetaData]
INDEXED_EXTRACTIONS = csv
NO_BINARY_CHECK = 1

However, the settings will be slightly different from the props.conf on the indexer, which may need to apply custom timestamp rules.

I found this document to be a a very useful explanation of the process;
https://wiki.splunk.com/Community:HowIndexingWorks

This question was also useful:
http://answers.splunk.com/answers/153488/does-a-universal-forwarder-ever-read-props-conf.html

nnmiller · ‎09-29-2015

Try setting the following in the props.conf for the CSV:

CHECK_FOR_HEADER = true
HEADER_FIELD_LINE_NUMBER = 1

I've had to set these before for CSV files where the header does not appear on the first line. It could be you have some extraneous invisible characters at the beginning of the file that the parser is not handling.

If that doesn't do it, then check that your line break after the header is correct for your OS using a hex editor or similar tool.

det0n8r · ‎09-29-2015

I ran into a similar problem that was solved by adding a stanza to transforms.conf that ignores the header row, for example:

   [setheadernull]
    REGEX = ^(Header1   Header2   Header3)
    DEST_KEY = queue
    FORMAT = nullQueue

gbronner_rbc · ‎09-29-2015

Interestingly, adding a oneshot with no sourcetype results in the file getting parsed as a CSV, which is nice except that it misses my custom TIMESTAMP_FIELD

gbronner_rbc · ‎09-29-2015

Interestingly, it works when I use the web API, but not from the universal forwarder.

gbronner_rbc · ‎09-29-2015

-bash-4.1$ /app/wwd0dev/splunk/bin/splunk btool props list

anyone know if the max distance overrides the TIMESTAMP_FIELD?

backtestMetaData --user=gbronner --app=backtest
[backtestMetaData]
ANNOTATE_PUNCT = True
AUTO_KV_JSON = true
BREAK_ONLY_BEFORE =
BREAK_ONLY_BEFORE_DATE = True
CHARSET = UTF-8
DATETIME_CONFIG = /etc/datetime.xml
HEADER_MODE =
INDEXED_EXTRACTIONS = csv
KV_MODE = none
LEARN_SOURCETYPE = true
LINE_BREAKER_LOOKBEHIND = 100
MAX_DAYS_AGO = 2000
MAX_DAYS_HENCE = 2
MAX_DIFF_SECS_AGO = 3600
MAX_DIFF_SECS_HENCE = 604800
MAX_EVENTS = 256
MAX_TIMESTAMP_LOOKAHEAD = 128
MUST_BREAK_AFTER =
MUST_NOT_BREAK_AFTER =
MUST_NOT_BREAK_BEFORE =
NO_BINARY_CHECK = true
SEGMENTATION = indexing
SEGMENTATION-all = full
SEGMENTATION-inner = inner
SEGMENTATION-outer = outer
SEGMENTATION-raw = none
SEGMENTATION-standard = standard
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = startTime
TRANSFORMS =
TRUNCATE = 10000
category = Custom
description = format for csv from testREsutls.csv
detect_trailing_nulls = false
disabled = false
maxDist = 100
priority =
pulldown_type = true
sourcetype =

Why is my sourcetype not parsing as CSV and am getting two events: one with a header and one with a raw event?

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!