Splunk Version 6.3.4
Search head cluster of 3 nodes
Indexer distributed search peers 4 nodes
I see the following error messages in _internal:
WARN DistBundleRestHandler - There was a problem renaming: /opt/splunk/var/run/searchpeers/F7521905-DA3E-4B9B-B2FE-08B911826B00-1465250902.b469fbba316fbf76.tmp -> /opt/splunkp/splunk/var/run/searchpeers/F7521905-DA3E-4B9B-B2FE-08B911826B00-1465250902: File exists
ERROR DistBundleRestHandler - Problem untarring file: /opt/splunk/var/run/searchpeers/F7521905-DA3E-4B9B-B2FE-08B911826B00-1465250902.bundle
WARN DistributedBundleReplicationManager - Asynchronous bundle replication to 4 peer(s) succeeded; however it took too long (longer than 10 seconds): elapsed_ms=48188, tar_elapsed_ms=10311, bundle_file_size=344190KB, replication_id=1465250902, replication_reason="async replication allowed"
Yes, as your search load increases, the bundle size will too and that will be compounded by the higher workload, too. Eventually the bundle replications will timeout and fail completely and the searches that go with them will fail. For this reason, many people setup cron jobs to delete ANY file that is more than x (typically 7) days old in the dispatch directory. Many times old files from long-ago-useless |outputcsv
commands can accumulate causing this problem.