Solved: Performance impact of using ignoreOlderThan on for...

stanwin · ‎10-09-2017

Hi

So we have a server which writes out thousands of files a day.
Over course of two months we can have 70K+ files.
We cant enforce a aggressive archival policy for the project team due to some other retention constraints.
There are also dat.gz files which may have even more CPU hit if we take into account the thousands of older files.

ignoreOlderThan = <nonnegative integer>[s|m|h|d]
* Causes the monitored input to stop checking files for updates if their
  modtime has passed this threshold.  This improves the speed of file tracking
  operations when monitoring directory hierarchies with large numbers of
  historical files (for example, when active log files are colocated with old
  files that are no longer being written to).
  * As a result, do not select a cutoff that could ever occur for a file
    you wish to index.  Take downtime into account!
    Suggested value: 14d , which means 2 weeks
* A file whose modtime falls outside this time window when seen for the first
  time will not be indexed at all.
* Default: 0, meaning no threshold.

if ignoreOlderThan is used in the inputs.conf does this process the input at monitor time or there is a internal blacklist setup
I think there might be an internal list because below is true for ignoreOlderThan from what I read in other answers:

Once a file is ignored, it never comes back to being monitored even if its timestamp is updated. The only way to stop this is to remove that feature and restart Splunk.

Now the query is : How will this affect performance of Splunk forwarder over time ( with thousands of blacklists being setup internally?)

stanwin · ‎10-10-2017

ssievert

thanks for the reply!!

Currently the forwarder is also very slow at startup ( finish initial flight checks & start sending data to indexers) after restart ( obviously due to the large number of files which need to be examined for crc/last read from etc)
the project team is made aware that this can be rectified only if they archive to the closest window & that it is not a SPLUNK issue.

Meanwhile I was wondering if we do go ahead with the ignoreolderthan , will we still have some slowness due to large amount of blacklist files to be referenced from please? ( now clear that this is from _fishbucket)

OR

is the blacklist lookup from _fishbucket relatively quick process ( lets say 5K blacklist files? 😄 .. thats way too many still I know!)

Constraints prevent the archiving from happening regularly on that box.

But depending on above we can present argument for enforcing the archiving .

View solution in original post

stanwin · ‎10-10-2017

ssievert

thanks for the reply!!

Currently the forwarder is also very slow at startup ( finish initial flight checks & start sending data to indexers) after restart ( obviously due to the large number of files which need to be examined for crc/last read from etc)
the project team is made aware that this can be rectified only if they archive to the closest window & that it is not a SPLUNK issue.

Meanwhile I was wondering if we do go ahead with the ignoreolderthan , will we still have some slowness due to large amount of blacklist files to be referenced from please? ( now clear that this is from _fishbucket)

OR

is the blacklist lookup from _fishbucket relatively quick process ( lets say 5K blacklist files? 😄 .. thats way too many still I know!)

Constraints prevent the archiving from happening regularly on that box.

But depending on above we can present argument for enforcing the archiving .

s2_splunk · ‎10-10-2017

I cannot credibly quantify the impact, but ignoreOlderThan will still incur an fstat to retrieve the last file modification time. So the UF is not just looking up the file path and last mod time from the _fishbucket, it will also have to do the fstat on the actual file to determine whether the two timestamps match and decide whether it needs to read the file or not.
I would definitely push for a cron script/logrotate that moves, zips (or at least renames) old files to a different location that no longer matches the [monitor://] expression.

Besides long startup times, a large number of files that are monitored also affects your index time latency, i.e. the delta in time between an event being logged and it being available for search in Splunk. Maybe you can spin your enforcement argument that way and get the right thing done? 😉

stanwin · ‎10-11-2017

Thanks ssievert!!

Yes, the best solution is to reduce the number of files as everything else is just trying some patchwork on the bigger issue.

lfedak_splunk · ‎10-09-2017

Hey @stanwin, if they solved your problem, remember to "√Accept" an answer to award karma points 🙂

s2_splunk · ‎10-09-2017

Every file that is monitored by Splunk is tracked in an index on the forwarder called _fishbucket. The best practice is to have a process in place that moves log files that should no longer be processed by Splunk out of the monitored directory or rename them such that it no longer matches the file name pattern in the monitor stanza.

Performance impact of using ignoreOlderThan on forwarder

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes