Getting Data In

Whitelisting/Blacklisting files inside tgz files

wdhathaway
Explorer

I have a bunch of .tgz files that are being regularly uploaded to a directory and I'd like to only index a subset of the files inside the archive files.

Example archive files:

   tar tzvf archive.1.2.tgz 
     -rw-r--r--  0 wdh    wdh       948 Jan 10 09:24 app1.log
     -rw-r--r--  0 wdh    wdh       414 Jan 10 09:24 foo.log
     -rw-r--r--  0 wdh    wdh       770 Jan 10 09:24 splat.log

  tar tzvf archive.5.8.tgz 
     -rw-r--r--  0 wdh    wdh       148 Jan 10 09:24 app3.log
     -rw-r--r--  0 wdh    wdh       216 Jan 10 09:24 bad.log
     -rw-r--r--  0 wdh    wdh       789 Jan 10 09:24 splat.log

From the example above, I'd like only the "splat.log" file inside archive.*.tgz to be indexed. It appears to me that the whitelist/blacklist settings for an inputs.conf stanza only apply to the archive file name, not to files inside the archive.

While I know I can have some external batch process run and pull the 'splat.log' files out, is there any way I can use whitelist/blacklist, or some other Splunk configuration mechanism to filter based on the internal filenames inside the archive files?

Tags (2)

gelica
Communicator

Hi,
Did you ever find a way to do this? 🙂

0 Karma

robsenk
Engager

Is this an issue with 4.3 as well? Been beating my heat on this one as well.

0 Karma

southeringtonp
Motivator

Not quite what you're looking for, but if nothing else you could route the events to nullQueue to discard the events from the unwanted files at index time.

jstockamp
Communicator

I've just run into this issue myself and have been beating my head against the wall trying to figure it out. It's odd that splunk supports using the name of a file inside a tgz with regex to specify the hostname, but it can't look inside the tarball for the blacklist. Very frustrating!

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...