Getting Data In

How to tell splunk to read log files only once, but keep monitoring the folder for new files?

zsimic
Path Finder

I have an ActiveBatch setup that generates many files (tens of thousands) in a folder. I'd like to have Splunk read only files freshly generated in these ActiveBatch folders. I am using the setting followTail=1 for now, and it works OK. Is there a better way to do this?

It took splunk several hours of 100% CPU usage to go through a couple of such folders (with 30K files each). The files are generated once and are never modified after that (so "following their tail" is useless).

Is there a way to tell that to splunk? A setting similar to followTail but that would tell it to:

  • look only at new files in a folder (ignore any files that existed before the input was defined in splunk)
  • each file is created when corresponding job starts running, the file grows for some time (anywhere from 1 second to several hours, depending how long the corresponding job takes to complete)
  • once the corresponding job is finished the log file will never be modified again (no use tailing it anymore)
  • there are tens of thousands of such files, in several folders (it looks like tailing all those files is taking a serious toll on splunkd)
  • each of these files has a common section at the end, that can be used to determine that no more monitoring is necessary (you can see that common section this question)
Tags (2)
1 Solution

Simeon
Splunk Employee
Splunk Employee

There is a setting for ignoring old files:

ignoreOlderThan = <time window>
* Causes the monitored input to stop checking files for updates if their modtime has passed this threshold.
  This improves the speed of file tracking operations when monitoring directory hierarchies with large numbers
  of historical files (for example, when active log files are colocated with old files that are no longer
  being written to).
* A file whose modtime falls outside this time window when seen for the first time will not be indexed at all.
* Value must be: <number><unit> (e.g., 7d is one week).  Valid units are d (days), m (minutes), and s (seconds).
* Default: disabled.

View solution in original post

Simeon
Splunk Employee
Splunk Employee

There is a setting for ignoring old files:

ignoreOlderThan = <time window>
* Causes the monitored input to stop checking files for updates if their modtime has passed this threshold.
  This improves the speed of file tracking operations when monitoring directory hierarchies with large numbers
  of historical files (for example, when active log files are colocated with old files that are no longer
  being written to).
* A file whose modtime falls outside this time window when seen for the first time will not be indexed at all.
* Value must be: <number><unit> (e.g., 7d is one week).  Valid units are d (days), m (minutes), and s (seconds).
* Default: disabled.

zsimic
Path Finder

Excellent! This seems to be quite suitable for this. Ignoring files older than 2 days will cover every situation in this case. Thanks!

0 Karma

dforstermg
New Member

I'm not getting 'ignoreOlderThan' to work?

[monitor:///[redacted/]
disabled = false
index = [redacted]
ignoreOlderThan=3d
blacklist = 201[0-9]-[0-1][0-8]
sourcetype = syslog

The directory is full of syslog files from rsyslog. When I do a 'splunk list monitor' its showing files that have dates back in 2017-12? (PS the blacklist was my attempt to stop if monitoring old files).

Like above OP, I have files created each day, but thousands of them. I dont want the UV to 'monitor' the files, but import any new ones. Once the files are created, they are never written too.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...