Getting Data In

Blacklist/ignore files by size

TonyLeeVT
Builder

When monitoring a directory for files (using inputs.conf) is it possible to blacklist or ignore files over a certain size? Say for instance a few files get dropped in that are 100 MB in size or more. Splunk usually errors after processing these anyway. Can I ignore processing these larger files? Thanks in advance.

0 Karma

woodcock
Esteemed Legend

I have used the following hack to solve this problem:

Create a new directory somewhere else (/destination/path/) and point the Splunk forwarder there. Then setup a cron job that creates selective soft links to files pointing to the real directory (/source/path/) for any file that meets your "keep" criteria, like this:

*/5 * * * * cd /source/file/path/ && /bin/find . -maxdepth 1 -type f -size -100M | /bin/sed "s/^..//" | /usr/bin/xargs -I {} /bin/ln -fs /source/path/{} /destination/path/{}

Don't forget to setup a 2nd cron to delete the broken softlinks (source files have been deleted), too, or you will end up with tens of thousands of files here, too.

0 Karma

TonyLeeVT
Builder

Thank you for adding a work around sir. 🙂 I will give it a try, but will still leave this issue open until Splunk adds a supported solution such as a file size parameter in inputs.conf. Thanks again.

0 Karma

somesoni2
Revered Legend

I don't think there is a native way to ignore files based on their size. On the other hand, Splunk can monitor files much larger than 100 MBs so could you tell us more about "Splunk usually errors after processing these anyway'??

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...