Getting Data In

ignoreolderthan in inputs.conf vs number of files in "splunk list monitor" performance

jwquah
Path Finder

Hi All,

I'm trying to see if we can improve the performance of a Splunk instance and trying to optimize it - e.g. putting sourcetype instead of letting it being automatic/etc. There is a data input that's monitoring a directory, and there are about 20,000 files within said directory. I've added ignoreolderthan = 7d to the inputs.conf

The question is:
- Does adding ignoreolderthan in inputs.conf make Splunk ignore those files? Does that mean that I should be seeing less files being monitored in /splunk list monitor as well as the Data Inputs in the Splunk webpage?
- Or is the only way to minimize the number of files being monitored is to move those files OUT of the monitored directory?

Thank you.

0 Karma
1 Solution

woodcock
Esteemed Legend

Using ignoreOlderThan will cause Splunk to totally ignore files forever: only the filename will be checked. HOWEVER, with 20k files (most of which you are "ignoring"), you still have to deal with the OS-level lag of accessing a list of files from a directory (calls to stat) that is too cluttered and the slowness of walking through that list when you know most of the files are permanently useless to you.

To avoid all of these problems, check out my (and other) answer here:
http://answers.splunk.com/answers/309910/how-to-monitor-a-folder-for-newest-files-only-file.html#ans...

View solution in original post

0 Karma

woodcock
Esteemed Legend

Using ignoreOlderThan will cause Splunk to totally ignore files forever: only the filename will be checked. HOWEVER, with 20k files (most of which you are "ignoring"), you still have to deal with the OS-level lag of accessing a list of files from a directory (calls to stat) that is too cluttered and the slowness of walking through that list when you know most of the files are permanently useless to you.

To avoid all of these problems, check out my (and other) answer here:
http://answers.splunk.com/answers/309910/how-to-monitor-a-folder-for-newest-files-only-file.html#ans...

0 Karma

jwquah
Path Finder

Thanks for answering, what's odd though is that it seems Splunk isn't ignoring the files. For example, out of the 20,000 files, say 1,000 of them are the last 7 days.

If I set ignoreOlderThan = 7d and restart Splunk, the splunk list monitor output still shows all the 20,000 files, so it doesn't look like they're ignored at all.

0 Karma

jwquah
Path Finder

OK, so I did a test and set up a test instance with a data input monitoring a directory with 206 files. The files inside range from May to September (yesterday).

In my inputs.conf, it's set to:

[monitor:///<dir>]
disabled = false
index = test_index
sourcetype = _json
ignoreOlderThan = 2d

There are only 13 files within the last two days, yet from the data inputs web view and even ./splunk list monitor, it shows the below.
alt text

Is this expected? It seems to be that Splunk is still monitoring the whole directory. With few files, it probably doesn't matter, but it'll definitely slow down over time as more files heap up (assuming one doesn't rotate them out)...

0 Karma

woodcock
Esteemed Legend

I have not used btool to verify the function of ignoreOlderThan but your test surprises me. I would open a case with support.

0 Karma

jwquah
Path Finder

Sorry for not updating this. After further testing, we were able to confirm. Splunk will monitor files already indexed even if ignoreOlderThan is set, unless the conf is set before the index takes place. If the ignoreOlderThan is set after files are indexed, only new files will conform to the ignoreOlderThan config.

0 Karma

woodcock
Esteemed Legend

Which is pretty much what I was telling you (and why I pointed you to my other answer which is a good way around this whole mess). You can flip back and forth between ignoreOlderThan and not, by adding/removing the setting: no problem. It is no surprise to find that Splunk is still monitoring them to some degree because it has to mark them as inactive and store that state somehow/somewhere. The way to test if the ignoreOlderThan setting is working is to wait the desired amount of days with no change at which point Splunk will mark it to ignore FOREVER. Then send new events to the file and confirm that those new events are not forwarded, which is the intention of the setting (but not what most people expect).

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...