I have a very busy search head that complains :
DistributedPeerManager - Unable to distribute to peer named slxxxxxxxxx:9089 at uri https://xxxxxxxx037:9089 because peer has status = "Down"
The messages will start in splunkd.log at 22:08:10.971 and finish at
22:09:46.994, but the message is reported about 60 times during short time period. A telnet from the SH to the indexer on 9089 shows no connectivity issues.
This has happened off and on for all indexers configured in distributed search. I am wondering if there is a setting that could be adjusted that to prevent these messages from occurring, or if there is a conf value that could be adjust to improve performance under high load. The SH is 10vpcus by 32gig, and there is a high load average on the SH and indexers (lots of searches).
There appears to be no negative impact to the messages, since searches are working. Users are not reporting any issues.
Hi lisaac, Based on the busyness of the hosts involved in the search, it seems reasonable that there could be momentary periods of high latency that could generate these messages. There are various timeout settings described in http://docs.splunk.com/Documentation/Splunk/6.3.0/Admin/Distsearchconf that could adjust the environment's expectations, for instance:
# this stanza controls the timing settings for connecting to a remote peer and
# the send timeout
[replicationSettings]
connectionTimeout = 10
sendRcvTimeout = 60
Did this start happening after a recent 6.3 upgrade? What platform are you running?
I've seen this message recently too following my 6.3 and some new app installs.