For some reason this happened a few weeks back and I cannot get it working again. Splunk v6.2.1 on Windows 2008R2. 1SH and 3Indexers.
Splunkd.log
01-13-2015 07:35:51.950 -0600 WARN NetUtils - SSL_read failed with SSL_ERROR_SYSCALL. bytes=-1 winsock error=10053
01-13-2015 07:35:51.950 -0600 ERROR DistributedBundleReplicationManager - Unable to upload bundle to peer named server with uri=https://server:8089.
01-13-2015 07:35:51.950 -0600 WARN DistributedBundleReplicationManager - Synchronous bundle replication to 3 peer(s) succeeded; however it took too long (longer than 10 seconds): elapsed_ms=23760, tar_elapsed_ms=8378, bundle_file_size=820260KB, replication_id=1421156128, replication_reason="sync replication required to establish common bundles across all search peers"
01-13-2015 07:35:51.950 -0600 WARN ISplunkDispatch - sid:scheduler__nobody__sos__RMD5fe2b0603bfc33e11_at_1421155530_24 Expected common latest bundle version on all peers after sync replication, found none. Reverting to old behavior - using most recent bundles on all
01-13-2015 07:35:51.950 -0600 WARN ISplunkDispatch - Expected common latest bundle version on all peers after sync replication, found none. Reverting to old behavior - using most recent bundles on all
01-13-2015 07:35:51.965 -0600 WARN DistributedPeerManager - Unable to distribute to peer named server1 at uri https://ops-server1:8089 because replication was unsuccessful. replicationStatus Failed
01-13-2015 07:35:51.965 -0600 WARN DistributedPeerManager - Unable to distribute to peer named server2 at uri https://ops-server2:8089 because replication was unsuccessful. replicationStatus Failed
01-13-2015 07:35:51.965 -0600 WARN DistributedPeerManager - Unable to distribute to peer named server3 at uri https://ops-server3:8089 because replication was unsuccessful. replicationStatus Failed
I'm getting a similar issue.
Search heads are 6.2.1 and indexers are 6.1.5.
Inside /opt/splunk/var/run/searchpeers/ I'm not seeing the bundle dir for my search heads.
Doing a btool on distsearch shows that "shareBundles = true" and "useSHPBundleReplication = false"
Just solved this. Turned out that 2 of my 3 sh cluster members hadn't restarted correctly so there wasn't a synchronised bundle to replicate to the search peers. Once I restarted them and a captain was elected (splunk show shcluster-status) the message disappeared.
The trick was in the error. It said a specific peer id (in my case 3eexxxxxxxxxx). On my index there was no bundle dir starting with 3ee. In the shcluster-status command I can see the "id" field with the 3ee. So it looks like in order for the bundle replication to occur the captain is the one that needs to provide this on behalf of all search head cluster members (they all give the same id to the search peers).
How do you do a "btool on distsearch"?
How do you "elect a captain"?
"Splunk btool distsearch list --debug"
Captains are elected dynamically by default. Nothing you have to actually do.
Im having the same problem, the error appears on the captain only, restarted the sh cluster and a new captain was elected and the problem now is on the newly elected captain.
We are using shclustering and clustering.
Any suggestions?
Can the captain connect to the index master and search peers?
yes, the weird thing is that we are getting results back and they are complete, but the warning is always there.
got any solution for this ?
I resolved this issue in my environment by checking the contents of the bundle. There were a few large files from an app that kept replication from occurring. Warning went away after deleting the files.
thank you for your reply ^^ I had the same problem because some of the files in the bundle didn't have the right permissions so they couldn't be read by splunk.
I had similar issues due to large lookups, I blacklisted the lookups on distsearch.conf on Search Head that was down and that fixed it.