Splunk recently fell over because the dispatch directory (on an ext2 filesystem) hit 32000 directory entries, so the OS would not let Splunk create any more, and all searches failed. The jobs dated back over 6 months or so - it looked like no jobs had ever been removed.
I removed old jobs by hand (find, xargs, rm etc) to free up directory slots, restarted Splunk, and searches started working again. Almost all of the jobs I had to remove did not have "save" set.
My question is: surely something (in Splunk) is normally meant to automatically delete old jobs, perhaps once their TTL has expired? What is meant to do that, and how might I debug the cause of it failing to clean up?
Sure, I can add a cron to do the find|xargs|rm dance to delete old jobs, but that feels very wrong. Splunk is meant to take care of this, no?
This seems to be related to
http://splunk-base.splunk.com/answers/28390/minimum-free-disk-space-1000mb-reached-for-optsplunkvarrundispatch
and http://splunk-base.splunk.com/answers/29551/too-many-search-jobs-found-in-the-dispatch-directory
but they don't seem to answer it.
... View more