This is probably a pretty basic issue, but, one thing I've noticed with my setup is that if I do something that requires a restart, it takes Splunk a VERY long time to go through all the directories and sub-directories of certain monitor: inputs and determine if files therein need to be indexed.
The vast majority of the time, they don't - the data in question is almost always already indexed, but it takes Splunk ages to realise that.
Example:
11-08-2011 11:05:48.468 +1100 DEBUG TailingProcessor - File state notification for path='/syslogs/fubar/2011/10/21/file1' (first time). 11-08-2011 11:05:48.469 +1100 DEBUG TailingProcessor - Item '/syslogs/fubar/2011/10/21/file1' matches stanza: /syslogs/fubar. 11-08-2011 11:05:48.469 +1100 DEBUG TailingProcessor - Storing config '/syslogs/fubar' for app ''. 11-08-2011 11:05:48.469 +1100 DEBUG TailingProcessor - Entry is associated with 1 configuration(s). 11-08-2011 11:05:48.469 +1100 DEBUG TailingProcessor - Will attempt to read file: /syslogs/fubar/2011/10/21/file1. 11-08-2011 11:06:27.386 +1100 DEBUG TailingProcessor - Got classified_sourcetype='file1-3' and classified_charset='UTF-8'. 11-08-2011 11:06:27.386 +1100 DEBUG TailingProcessor - About to read data (Opening file: /syslogs/fubar/2011/10/21/file1). 11-08-2011 11:06:27.386 +1100 DEBUG TailingProcessor - Hit EOF immediately.
Within the /syslogs/ mount point there are a number of services, broken down by year, month and day. The "file1" file is updated throughout the day via syslog-ng. The problem is that if I restart Splunk, the TailingProcessor does what it should probably do and re-scans this, and every other file...
What's the best/right way to speed this process up so that Splunk doesn't take several hours to go through every file in the monitor: paths and re-check them all if you restart the server?
Are you using the ignoreOlderThan
parameter?
See, I knew there had to be an obvious one here. I was staring at the inputs.conf documentation page and somehow completely skipped this. D'oh.