Getting Data In

If you had a choice, would you use CSV or XML files for Splunk to eat?

dhaffner
Path Finder

I'm looking at a log source that will be sent once a day to a forwarder as one large file. Then it will be eaten by Splunk and sent on to the indexer. Is it better to use XML or CSV? Maybe even JSON?

Here's what we want:

  1. A cron job runs on the Application server to obtain the past day's incidents and put them into a CSV or XML format file
  2. The file is transferred to our global forwarder through some secure mechanism. (SCP?)
  3. The forwarder monitors the location of the file and feeds the data into Splunk
  4. The process is repeated each day, and the file on the forwarder is overwritten by the new file that gets sent across

Thanks!

1 Solution

gkanapathy
Splunk Employee
Splunk Employee

I would recommend CSV over XML. It's compact and parsing it is inexpensive and reliable. Splunk fields just hold strings and numbers, so the complexity of using formats (such as XML and JSON) that can handle composite objects is unnecessary.

The downside of CSV is that you pretty much need to know how many and what fields you're going to need ahead of time, and every event must have the same (possibly empty) fields.

Better than either for Splunk purposes would be to use something that can take auto KV extraction, e.g., field1="value1", field2="value2", field3="value three", field4="more". This allows you the flexibility of setting fields differently per event.

Multiline strings can be handled by a format like: http://answers.splunk.com/questions/3231/escaping-characters-in-an-event/3549#3549 though you could also use XML.

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

I would recommend CSV over XML. It's compact and parsing it is inexpensive and reliable. Splunk fields just hold strings and numbers, so the complexity of using formats (such as XML and JSON) that can handle composite objects is unnecessary.

The downside of CSV is that you pretty much need to know how many and what fields you're going to need ahead of time, and every event must have the same (possibly empty) fields.

Better than either for Splunk purposes would be to use something that can take auto KV extraction, e.g., field1="value1", field2="value2", field3="value three", field4="more". This allows you the flexibility of setting fields differently per event.

Multiline strings can be handled by a format like: http://answers.splunk.com/questions/3231/escaping-characters-in-an-event/3549#3549 though you could also use XML.

dhaffner
Path Finder

All good insights. Thanks to you both!

0 Karma

Paolo_Prigione
Builder

I'd recommend csv or even something taking advantage of splunk's default key-value extraction, like

2011/01/27 21:34:32.432 host severity=ERROR userId=ted transaction=w4534rp234 message="..... ... ..."

that way with 0 config you'd have all the fields you explicitly listed in the log.

Json and XML are complex formats and dealing with them is not as straighforward as good old csv or similar

Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...