Getting Data In

Index gzipped files without .gz extension

chris
Motivator

Hi,

I am trying to index gzipped files that do not have the .gz extension on a window universal forwarder.

First I got the following messages in splunkd.log:

11-18-2019 15:06:33.698 +0100 INFO  TailReader - Ignoring file 'D:\path\to\log\messages_xyz' due to: binary
11-18-2019 15:06:33.698 +0100 WARN FileClassifierManager - The file 'D:\path\to\log\messages_xyz' is invalid. Reason: binary.

Looking at how splunk handles gzipped files in props.conf of system/default I tried to put the following props.conf together

[mysourcetype]
invalid_cause = archive
NO_BINARY_CHECK = true
is_valid = False

[source::D:\path\to\log\*]
#Default
#unarchive_cmd = _auto
#On linux
#unarchive_cmd = gzip -cd -
#On windows
unarchive_cmd = splunk-compresstool -g

trying out splunk-compresstool seems to work:
.\splunk-compresstool.exe -g 'xyz'

2019-03-27 16:01:34.000 device kern.info kernel: udevd version 124 started
2019-03-27 16:01:34.000 device kern.info kernel: net eth0: eth0: allmulti set
2019-03-27 16:01:34.000 device kern.info kernel: net eth0: eth0: allmulti set
2019-03-27 16:06:44.000 devicekern.warn kernel: JFFS2 warning: (793) jffs2_sum_write_data: Not enough space for summary, padsize = -376

This is what I see in splunkd.log

11-18-2019 16:55:47.351 +0100 INFO  ArchiveProcessor - Handling file=xyz
11-18-2019 16:55:47.351 +0100 INFO  ArchiveProcessor - reading path=xyz (seek=0 len=211534)
11-18-2019 16:55:47.402 +0100 INFO  ArchiveProcessor - Finished processing file 'xyz', removing from stats

And this is what I see in metrics.log

11-18-2019 17:03:47.471 +0100 INFO  Metrics - group=per_source_thruput, ingest_pipe=0, series="xyz", kbps=0, eps=0.03224797474898443, kb=0, ev=1, avg_age=0, max_age=0

Although metrics.log says that ev=1 I do not see any events in the index (and there should be more than 1 event per file)

Is there a possibility to see what the ArchiveProcessor is doing?

Shouldn't Splunk just recognize filetypes without depending on the extension?

Regards Chris

0 Karma

chris
Motivator

This is the props that worked for

 [source::D:\\path\\to\log\\*]
 #Default
 #unarchive_cmd = _auto
 #On linux
 #unarchive_cmd = gzip -cd -
 #On windows
 unarchive_cmd = splunk-compresstool -g
 invalid_cause = archive
 NO_BINARY_CHECK = true
 is_valid = False

anwarmian
Communicator

Thanks so much for your post.  I am surprised that the following did not work.  "_auto" is not the default value meaning setting it  "_auto" would make Splunk automatically extract the archived file unless a file extension is required.  I am having a case where Splunk is ingesting gzip file without extension but the files after ingestion is not in text format.  After testing a file with .gz Splunk recognized it and decompressed it properly.  That tells me that Splunk requires an archived file to have an extension.

unarchive_cmd = _auto

 

0 Karma

chris
Motivator

Turns out that i forgot to escape the \ in the win path in props.conf

0 Karma
Get Updates on the Splunk Community!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer at Splunk .conf24 ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...

Splunk Lantern is Splunk’s customer success center that provides advice from Splunk experts on valuable data ...