Getting Data In

How to configure line breaking for a CSV file where line breaks are signified by a semicolon?

dablackgoku1234
New Member

Hi everyone,

I have a CSV file where the line breaks are signified by a semicolon ;. I am wondering how one would parse this CSV with the "line break" being a different character. Example:

Number, Score;  1 , 44.5678690273 ;11 , 60.0795233081 ;  14 , 13.6359924845 ;  16 , 44.6169376811 ;  17 , 47.6782506507 ; 

I tried using:

HEADER_FIELD_LINE_NUMBER=1
FIELD_NAMES=Number, Score
BREAK_ONLY_BEFORE=;
CHARSET=AUTO
INDEXED_EXTRACTIONS=csv
KV_MODE=none
LINE_BREAK=;
MUST_BREAK_AFTER=;
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=true
pulldown_type=true

However, it does not break the events at the semicolons.

0 Karma

jkat54
SplunkTrust
SplunkTrust

Try this:

 CHARSET=AUTO
 INDEXED_EXTRACTIONS=csv
 KV_MODE=none
 SHOULD_LINEMERGE=false
 LINE_BREAKER=;
 NO_BINARY_CHECK=true
 pulldown_type=true

Make sure this props.conf is at the source of the data such as the forwarder.

0 Karma

dablackgoku1234
New Member

Without specifying the FIELD_NAMES, I get a No results found. Please change source type, adjust source type settings, or check your source file.

However, specifying FIELD_NAMES still does not parse the semicolons properly. It would think

Score; 1
44.5678690273 ;11
60.0795233081 ; 14

are field values.

0 Karma

jkat54
SplunkTrust
SplunkTrust

Did you change line break to line breaker?

0 Karma

jkat54
SplunkTrust
SplunkTrust

Maybe try a
SEDCMD-semicolon="s/;/\n\r/g"

0 Karma

dablackgoku1234
New Member

Adding SEDCMD-semicolon="s/;/\n\r/g" did not change the result. Do I need to do anything special to enable SEDCMD?

I've added the line to my props.conf file. I've also tried SEDCMD-replace=s/;/\r\n/g and the results are the same.

0 Karma

jkat54
SplunkTrust
SplunkTrust

I believe the indexed_extractions is overriding the sedcmd, and line_breaker.

You will probably have to disable indexed_extractions and use EXTRACT-name to extract the values to field names, and then discard the header with sedcmd

LINE_BREAKER=;
SEDCMD-headerRemoval = s/Number\s+\,\s+Score//g
EXTRACT-fields = ^(?<Number>\d+)\s+\,\s+(?<Score>\d+\.\d+|\d+)  #gets whole integers and factions
0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...