The key to the final resolution is that our use case involves a lot of small files, and we have a notable latency (~5ms) between filer and indexer.
Splunk uses stat() and access() a fair bit during it's various uptake cycles. With lots of small files (as opposed to a few big ones), Splunk is spending expensive, uncached iops to stat() the files as it traverses the inputs.
Had the situation been reverse (a few big files), readahead cache would've kicked in, and the effect of the latency would've been negligible.
To mitigate this a little, we added forwarders closer to the source (<1ms), to take advantage of less RTT on the noncached iops. Curiously, we've observed NFS caching being drastically less effective on access() calls at higher latencies, but we're still investigating some of these interesting side-effects.
... View more