Getting Data In

Why is my sourcetype not parsing as CSV and am getting two events: one with a header and one with a raw event?

gbronner_rbc
Explorer

I'm trying to parse a CSV file, but I'm getting two events: one with a header and one with a raw event. It is driving me nuts. I've tried deleting and reloading the data multiple times. The file has 2 lines, so at least it is small.

The file is being loaded via a CLI:

splunk add oneshot <filename> -sourcetype backtestMetaData -index grb_test

On my server, props.conf in ./etc/apps/<app_name>/local/props.conf
I've looked for 'backtest' in other props.conf files, but don't see any. Nothing special on the forwarder.

[ backtestMetaData]
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = startTime
category = Structured
description = format for csv from testREsutls.csv
disabled = false
pulldown_type = true

[source::.../testResults.csv]
sourcetype=backtestMetaData
0 Karma
1 Solution

gbronner_rbc
Explorer

It appears that when loading data from a universal forwarder when the data is header structured (e.g. CSV, TSV), one must edit props.conf on the forwarder in order to tell splunk that the sourcetype has to be handled differently.

Example is:

[backtestMetaData]
INDEXED_EXTRACTIONS = csv
NO_BINARY_CHECK = 1

However, the settings will be slightly different from the props.conf on the indexer, which may need to apply custom timestamp rules.

I found this document to be a a very useful explanation of the process;
https://wiki.splunk.com/Community:HowIndexingWorks

This question was also useful:
http://answers.splunk.com/answers/153488/does-a-universal-forwarder-ever-read-props-conf.html

View solution in original post

gbronner_rbc
Explorer

It appears that when loading data from a universal forwarder when the data is header structured (e.g. CSV, TSV), one must edit props.conf on the forwarder in order to tell splunk that the sourcetype has to be handled differently.

Example is:

[backtestMetaData]
INDEXED_EXTRACTIONS = csv
NO_BINARY_CHECK = 1

However, the settings will be slightly different from the props.conf on the indexer, which may need to apply custom timestamp rules.

I found this document to be a a very useful explanation of the process;
https://wiki.splunk.com/Community:HowIndexingWorks

This question was also useful:
http://answers.splunk.com/answers/153488/does-a-universal-forwarder-ever-read-props-conf.html

nnmiller
Contributor

Try setting the following in the props.conf for the CSV:

CHECK_FOR_HEADER = true
HEADER_FIELD_LINE_NUMBER = 1

I've had to set these before for CSV files where the header does not appear on the first line. It could be you have some extraneous invisible characters at the beginning of the file that the parser is not handling.

If that doesn't do it, then check that your line break after the header is correct for your OS using a hex editor or similar tool.

det0n8r
Explorer

I ran into a similar problem that was solved by adding a stanza to transforms.conf that ignores the header row, for example:

   [setheadernull]
    REGEX = ^(Header1   Header2   Header3)
    DEST_KEY = queue
    FORMAT = nullQueue
0 Karma

gbronner_rbc
Explorer

Interestingly, adding a oneshot with no sourcetype results in the file getting parsed as a CSV, which is nice except that it misses my custom TIMESTAMP_FIELD

0 Karma

gbronner_rbc
Explorer

Interestingly, it works when I use the web API, but not from the universal forwarder.

0 Karma

gbronner_rbc
Explorer

-bash-4.1$ /app/wwd0dev/splunk/bin/splunk btool props list

anyone know if the max distance overrides the TIMESTAMP_FIELD?

backtestMetaData --user=gbronner --app=backtest
[backtestMetaData]
ANNOTATE_PUNCT = True
AUTO_KV_JSON = true
BREAK_ONLY_BEFORE =
BREAK_ONLY_BEFORE_DATE = True
CHARSET = UTF-8
DATETIME_CONFIG = /etc/datetime.xml
HEADER_MODE =
INDEXED_EXTRACTIONS = csv
KV_MODE = none
LEARN_SOURCETYPE = true
LINE_BREAKER_LOOKBEHIND = 100
MAX_DAYS_AGO = 2000
MAX_DAYS_HENCE = 2
MAX_DIFF_SECS_AGO = 3600
MAX_DIFF_SECS_HENCE = 604800
MAX_EVENTS = 256
MAX_TIMESTAMP_LOOKAHEAD = 128
MUST_BREAK_AFTER =
MUST_NOT_BREAK_AFTER =
MUST_NOT_BREAK_BEFORE =
NO_BINARY_CHECK = true
SEGMENTATION = indexing
SEGMENTATION-all = full
SEGMENTATION-inner = inner
SEGMENTATION-outer = outer
SEGMENTATION-raw = none
SEGMENTATION-standard = standard
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = startTime
TRANSFORMS =
TRUNCATE = 10000
category = Custom
description = format for csv from testREsutls.csv
detect_trailing_nulls = false
disabled = false
maxDist = 100
priority =
pulldown_type = true
sourcetype =

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...