Splunk Search

loading a zip file for lookup?

smileyge
Path Finder

I have a ~250MB csv file I want to use in a lookup, it takes forever when I do the search to get it into memory so I want to try to zip it and see if that helps or hurts. Was at Splunk>Live yesterday and this was a suggested approach by a couple folks. Problem: when I zip the file and try to add a new lookup I get an error that says file is binary and not gzipped. I've tried windows compression, GNU gzip, gnu gzip with unix style endlines in the CSV, with extension .gz, .zip, .csv.gz, .csv.zip, always get same error. File is ~50MB compressed. Any suggestions?

I see on the page where you pick the file the little help thing even talks about loading a zip so I don't understand why this isn't working.

Tags (1)
0 Karma

smileyge
Path Finder

According to that post [^x00-x7F] is the regex to find a "special character" in the file. That finds commas. I have millions of those - it's a compressed CSV file. The file imports fine when I don't compress it, but when I compress I get the error. Am I spending good time after bad? Can anyone comment on even the merits of trying to use a .gz file in a lookup vs. an uncompressed CSV for performance reasons?

0 Karma

ChrisG
Splunk Employee
Splunk Employee

You should check your CSV for any special characters, beyond the line endings. See this previous Answers posting.

0 Karma

ChrisG
Splunk Employee
Splunk Employee

That is...odd. Uploading a compressed CSV file should work fine. Just to troubleshoot the basics: can you uncompress the file and open it successfully? I'm assuming you've confirmed that there's nothing wrong with the compression itself, but want to confirm. And I don't have any other real ideas at the moment. 😕

0 Karma

smileyge
Path Finder

According to that post [^x00-x7F] is the regex to find a "special character" in the file. That finds commas. I have millions of those - it's a compressed CSV file. The file imports fine when I don't compress it, but when I compress I get the error. Am I spending good time after bad? Can anyone comment on even the merits of trying to use a .gz file in a lookup vs. an uncompressed CSV for performance reasons?

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...