Getting Data In

Splunk unable to handle double-quoted CSV?

vectorsc
Explorer

Example Line:

"Stuff to be, together as one item",nextvalue,andanother,andso-on

When using splunk auto header detection and DELIMS="," this blows up and shifts the non-quoted fields back, and gives the values:

Field1: "Stuff to be

Field2: together as one item"

Field3: nextvalue

Field4: andanother

Field5: andso-on

Any way around this other than a massive regex? if if you did make a massive regex (we did actually), how would you make it cope with the following:

LINE1: "Stuff to be, together as one item",nextvalue,andanother,andso-on

LINE2: "Stuff to be, together as one item, and this stuff too, and there might be 10 of them...",nextvalue,andanother,andso-on

Tags (1)

dmr195
Communicator

If you don't use the auto header detection (and presumably you stopped using this to try your regex solution) then, as far as I can see, CSV with double quoted fields works fine.

I've got a setup that has the following in props.conf:


[source::prelertanomaly]
CHARSET = UTF-8
MAX_TIMESTAMP_LOOKAHEAD = 10
TIME_FORMAT = %s
REPORT-prelert = prelertanomaly

the following in transforms.conf:


[prelertanomaly]
DELIMS = ","
FIELDS = _time, prelertreferencetime, prelertkey, prelertbucketsize, prelertfieldname, prelertfieldvalue, prelertbaselinerate, prelertcurrentrate, prelertbaselinemean, prelertcurrentmean, sparkline, prelertanomalyfactor, prelertprobability, prelertanomalysearchterms, prelertanomalywhereclause, prelertexamples, prelertmetricfield

and it quite happily handles records of this form:

1362673800,1362673800,unixcpu,300,count,count,3.995050,10,3.995050,10.000000,"##__SPARKLINE__##,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0",0.000000,100.000000,,,,

Notice that the sparkline there has lots of commas in it inside the surrounding double quotes, and Splunk doesn't break it up. With my config and the specific event above Splunk extracted these fields:

host=davids-macbook-pro
sourcetype=stash
source=prelertanomaly
prelertanomalyfactor=0.000000
prelertbaselinerate=3.995050
prelertbucketsize=300
prelertcurrentrate=10
prelertfieldname=count
prelertfieldvalue=count
prelertprobability=100.000000
prelertreferencetime=1362673800
sparkline=##__SPARKLINE__##,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0
time=1362673800
prelertkey=unixcpu
prelertcurrentmean=10.000000
prelertbaselinemean=3.995050

This works with Splunk 4.3.2 and 5.0.1 at least.

One annoyance I have seen with the way Splunk parses CSV input data is that it expects double quotes that are to remain part of the field values to be escaped with backslashes, whereas in the CSV that custom search commands take as input and emit as output double quotes are escaped by having two consecutive double quotes (like Excel does it).

0 Karma

vectorsc
Explorer

oddly, the inputs and such we were using and the methods we used were identical to the way you were doing it. It still didn't work in 4.3.2.

That said, we were able to change the double quoted commas to something else (pipe symbol) and that worked just fine. Most aggravating. I will likely re-attempt this issue at a later date in 5.x

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...