Getting Data In

Why is my sourcetype not parsing as CSV and am getting two events: one with a header and one with a raw event?

gbronner_rbc
Explorer

I'm trying to parse a CSV file, but I'm getting two events: one with a header and one with a raw event. It is driving me nuts. I've tried deleting and reloading the data multiple times. The file has 2 lines, so at least it is small.

The file is being loaded via a CLI:

splunk add oneshot <filename> -sourcetype backtestMetaData -index grb_test

On my server, props.conf in ./etc/apps/<app_name>/local/props.conf
I've looked for 'backtest' in other props.conf files, but don't see any. Nothing special on the forwarder.

[ backtestMetaData]
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = startTime
category = Structured
description = format for csv from testREsutls.csv
disabled = false
pulldown_type = true

[source::.../testResults.csv]
sourcetype=backtestMetaData
0 Karma
1 Solution

gbronner_rbc
Explorer

It appears that when loading data from a universal forwarder when the data is header structured (e.g. CSV, TSV), one must edit props.conf on the forwarder in order to tell splunk that the sourcetype has to be handled differently.

Example is:

[backtestMetaData]
INDEXED_EXTRACTIONS = csv
NO_BINARY_CHECK = 1

However, the settings will be slightly different from the props.conf on the indexer, which may need to apply custom timestamp rules.

I found this document to be a a very useful explanation of the process;
https://wiki.splunk.com/Community:HowIndexingWorks

This question was also useful:
http://answers.splunk.com/answers/153488/does-a-universal-forwarder-ever-read-props-conf.html

View solution in original post

gbronner_rbc
Explorer

It appears that when loading data from a universal forwarder when the data is header structured (e.g. CSV, TSV), one must edit props.conf on the forwarder in order to tell splunk that the sourcetype has to be handled differently.

Example is:

[backtestMetaData]
INDEXED_EXTRACTIONS = csv
NO_BINARY_CHECK = 1

However, the settings will be slightly different from the props.conf on the indexer, which may need to apply custom timestamp rules.

I found this document to be a a very useful explanation of the process;
https://wiki.splunk.com/Community:HowIndexingWorks

This question was also useful:
http://answers.splunk.com/answers/153488/does-a-universal-forwarder-ever-read-props-conf.html

nnmiller
Contributor

Try setting the following in the props.conf for the CSV:

CHECK_FOR_HEADER = true
HEADER_FIELD_LINE_NUMBER = 1

I've had to set these before for CSV files where the header does not appear on the first line. It could be you have some extraneous invisible characters at the beginning of the file that the parser is not handling.

If that doesn't do it, then check that your line break after the header is correct for your OS using a hex editor or similar tool.

det0n8r
Explorer

I ran into a similar problem that was solved by adding a stanza to transforms.conf that ignores the header row, for example:

   [setheadernull]
    REGEX = ^(Header1   Header2   Header3)
    DEST_KEY = queue
    FORMAT = nullQueue
0 Karma

gbronner_rbc
Explorer

Interestingly, adding a oneshot with no sourcetype results in the file getting parsed as a CSV, which is nice except that it misses my custom TIMESTAMP_FIELD

0 Karma

gbronner_rbc
Explorer

Interestingly, it works when I use the web API, but not from the universal forwarder.

0 Karma

gbronner_rbc
Explorer

-bash-4.1$ /app/wwd0dev/splunk/bin/splunk btool props list

anyone know if the max distance overrides the TIMESTAMP_FIELD?

backtestMetaData --user=gbronner --app=backtest
[backtestMetaData]
ANNOTATE_PUNCT = True
AUTO_KV_JSON = true
BREAK_ONLY_BEFORE =
BREAK_ONLY_BEFORE_DATE = True
CHARSET = UTF-8
DATETIME_CONFIG = /etc/datetime.xml
HEADER_MODE =
INDEXED_EXTRACTIONS = csv
KV_MODE = none
LEARN_SOURCETYPE = true
LINE_BREAKER_LOOKBEHIND = 100
MAX_DAYS_AGO = 2000
MAX_DAYS_HENCE = 2
MAX_DIFF_SECS_AGO = 3600
MAX_DIFF_SECS_HENCE = 604800
MAX_EVENTS = 256
MAX_TIMESTAMP_LOOKAHEAD = 128
MUST_BREAK_AFTER =
MUST_NOT_BREAK_AFTER =
MUST_NOT_BREAK_BEFORE =
NO_BINARY_CHECK = true
SEGMENTATION = indexing
SEGMENTATION-all = full
SEGMENTATION-inner = inner
SEGMENTATION-outer = outer
SEGMENTATION-raw = none
SEGMENTATION-standard = standard
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = startTime
TRANSFORMS =
TRUNCATE = 10000
category = Custom
description = format for csv from testREsutls.csv
detect_trailing_nulls = false
disabled = false
maxDist = 100
priority =
pulldown_type = true
sourcetype =

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...