Splunk unable to handle double-quoted CSV?

vectorsc · ‎09-25-2012

Example Line:

"Stuff to be, together as one item",nextvalue,andanother,andso-on

When using splunk auto header detection and DELIMS="," this blows up and shifts the non-quoted fields back, and gives the values:

Field1: "Stuff to be

Field2: together as one item"

Field3: nextvalue

Field4: andanother

Field5: andso-on

Any way around this other than a massive regex? if if you did make a massive regex (we did actually), how would you make it cope with the following:

LINE1: "Stuff to be, together as one item",nextvalue,andanother,andso-on

LINE2: "Stuff to be, together as one item, and this stuff too, and there might be 10 of them...",nextvalue,andanother,andso-on

dmr195 · ‎03-07-2013

If you don't use the auto header detection (and presumably you stopped using this to try your regex solution) then, as far as I can see, CSV with double quoted fields works fine.

I've got a setup that has the following in props.conf:


[source::prelertanomaly]
CHARSET = UTF-8
MAX_TIMESTAMP_LOOKAHEAD = 10
TIME_FORMAT = %s
REPORT-prelert = prelertanomaly

the following in transforms.conf:


[prelertanomaly]
DELIMS = ","
FIELDS = _time, prelertreferencetime, prelertkey, prelertbucketsize, prelertfieldname, prelertfieldvalue, prelertbaselinerate, prelertcurrentrate, prelertbaselinemean, prelertcurrentmean, sparkline, prelertanomalyfactor, prelertprobability, prelertanomalysearchterms, prelertanomalywhereclause, prelertexamples, prelertmetricfield

and it quite happily handles records of this form:

1362673800,1362673800,unixcpu,300,count,count,3.995050,10,3.995050,10.000000,"##__SPARKLINE__##,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0",0.000000,100.000000,,,,

Notice that the sparkline there has lots of commas in it inside the surrounding double quotes, and Splunk doesn't break it up. With my config and the specific event above Splunk extracted these fields:

host=davids-macbook-pro
sourcetype=stash
source=prelertanomaly
prelertanomalyfactor=0.000000
prelertbaselinerate=3.995050
prelertbucketsize=300
prelertcurrentrate=10
prelertfieldname=count
prelertfieldvalue=count
prelertprobability=100.000000
prelertreferencetime=1362673800
sparkline=##__SPARKLINE__##,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0
time=1362673800
prelertkey=unixcpu
prelertcurrentmean=10.000000
prelertbaselinemean=3.995050

This works with Splunk 4.3.2 and 5.0.1 at least.

One annoyance I have seen with the way Splunk parses CSV input data is that it expects double quotes that are to remain part of the field values to be escaped with backslashes, whereas in the CSV that custom search commands take as input and emit as output double quotes are escaped by having two consecutive double quotes (like Excel does it).

vectorsc · ‎03-07-2013

oddly, the inputs and such we were using and the methods we used were identical to the way you were doing it. It still didn't work in 4.3.2.

That said, we were able to change the double quoted commas to something else (pipe symbol) and that worked just fine. Most aggravating. I will likely re-attempt this issue at a later date in 5.x

Splunk unable to handle double-quoted CSV?

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

They're back! Join the SplunkTrust and MVP at .conf24

Enterprise Security Content Update (ESCU) | New Releases