Getting Data In

If you had a choice, would you use CSV or XML files for Splunk to eat?

dhaffner
Path Finder

I'm looking at a log source that will be sent once a day to a forwarder as one large file. Then it will be eaten by Splunk and sent on to the indexer. Is it better to use XML or CSV? Maybe even JSON?

Here's what we want:

  1. A cron job runs on the Application server to obtain the past day's incidents and put them into a CSV or XML format file
  2. The file is transferred to our global forwarder through some secure mechanism. (SCP?)
  3. The forwarder monitors the location of the file and feeds the data into Splunk
  4. The process is repeated each day, and the file on the forwarder is overwritten by the new file that gets sent across

Thanks!

1 Solution

gkanapathy
Splunk Employee
Splunk Employee

I would recommend CSV over XML. It's compact and parsing it is inexpensive and reliable. Splunk fields just hold strings and numbers, so the complexity of using formats (such as XML and JSON) that can handle composite objects is unnecessary.

The downside of CSV is that you pretty much need to know how many and what fields you're going to need ahead of time, and every event must have the same (possibly empty) fields.

Better than either for Splunk purposes would be to use something that can take auto KV extraction, e.g., field1="value1", field2="value2", field3="value three", field4="more". This allows you the flexibility of setting fields differently per event.

Multiline strings can be handled by a format like: http://answers.splunk.com/questions/3231/escaping-characters-in-an-event/3549#3549 though you could also use XML.

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

I would recommend CSV over XML. It's compact and parsing it is inexpensive and reliable. Splunk fields just hold strings and numbers, so the complexity of using formats (such as XML and JSON) that can handle composite objects is unnecessary.

The downside of CSV is that you pretty much need to know how many and what fields you're going to need ahead of time, and every event must have the same (possibly empty) fields.

Better than either for Splunk purposes would be to use something that can take auto KV extraction, e.g., field1="value1", field2="value2", field3="value three", field4="more". This allows you the flexibility of setting fields differently per event.

Multiline strings can be handled by a format like: http://answers.splunk.com/questions/3231/escaping-characters-in-an-event/3549#3549 though you could also use XML.

dhaffner
Path Finder

All good insights. Thanks to you both!

0 Karma

Paolo_Prigione
Builder

I'd recommend csv or even something taking advantage of splunk's default key-value extraction, like

2011/01/27 21:34:32.432 host severity=ERROR userId=ted transaction=w4534rp234 message="..... ... ..."

that way with 0 config you'd have all the fields you explicitly listed in the log.

Json and XML are complex formats and dealing with them is not as straighforward as good old csv or similar

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...