Getting Data In

CSV parsing fails because of long header

tmaoz
Loves-to-Learn Everything

Hello,

 

I have a CSV file with many MANY columns (in my case there are 7334 columns with an average length of 145-146 chars each. This is a telemetry file exported from some networking equipment and this is just part of the exported data...

The file has over 1000 data rows but I'm just trying to add 5 rows at the moment.

Trying to create an input for the file fails when adding more that 4175 columns with the following error:

"Accumulated a line of 512256 bytes while reading a structured header, giving up parsing header"

I have already tried to increase all TRUNCATION settings to well above this value (several orders of magnitude) as well as the "[kv]" limits in the "limits.conf" file. Nothing helps.

I searched the forum here but couldn't find anything relevant. A Google search yielded two results, one where people just decided that headers that are too long are the user's problem and did not offer any resolution (not even to say it's not possible). The other result just went unanswered.

Couldn't find anything relevant in the Splunk online documentation or REST API specifications either.

I will also mention that processing the full data file with Python using either the standard csv parser or Pandas works just fine and very quickly. The total file size is ~92MB which is not big at all IMHO.

My Splunk info:
Version:9.1.2
Build:b6b9c8185839
Server:834f30dfffad
Products:hadoop

Needless to say the web frontend crashes entirely when I try to create the input so I'm doing everything via the Python SDK now.

Any ideas if this can be fixed to I can add all of my data?

Labels (2)
0 Karma

tscroggins
Influencer

Hi @tmaoz,

The errors are reported when onboarding data through Splunkweb; however, if you're using INDEXED_EXTRACTIONS = csv, for example, the fields should be present in the index itself. You can verify this with the walklex command after indexing your CSV file:

| walklex index=xxx type=field
| table field

You may need to increase the [kv] indexed_kv_limit settings in limits.conf or set it to 0 to disable the limit.

0 Karma

tmaoz
Loves-to-Learn Everything

Actually, SplunkWeb crashes in the "preview" stage of creating a new input so I can't even create the input that way.

That is why I'm using the Python SDK (which is basically the REST API) and I very much see that error message in the debug log so it's not a SplunkWeb issue at all.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

@tmaoz wrote:

Actually, SplunkWeb crashes in the "preview" stage of creating a new input so I can't even create the input that way.


If you're having problems using SplunkWeb then create the input by editing inputs.conf and increase the limit by modifying limits.conf.  Then restart Splunk for the changes to take effect.

---
If this reply helps you, Karma would be appreciated.

richgalloway
SplunkTrust
SplunkTrust

The number of rows is not an issue and Splunk regularly handles files much larger than 92MB.

The TRUNCATE setting applies to events, not headers.

Since you're using Python now, consider a scripted input to read the file and convert it to k=v format.

---
If this reply helps you, Karma would be appreciated.
0 Karma

tmaoz
Loves-to-Learn Everything

Thanks for the reply!

Indeed, I have already converted the file into the metrics CSV format of a separate row per timestamp per metric. That works and I can ingest the data. However, it increases the file size from 92MB to 1.4GB so very VERY wasteful to be sure.

I will work the problem some more and see what I can do.

0 Karma
Get Updates on the Splunk Community!

Index This | I’m short for "configuration file.” What am I?

May 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with a Special ...

New Articles from Academic Learning Partners, Help Expand Lantern’s Use Case Library, ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Your Guide to SPL2 at .conf24!

So, you’re headed to .conf24? You’re in for a good time. Las Vegas weather is just *chef’s kiss* beautiful in ...