We have 5 Node SHC member on splunk version 6.3. The Captain election is not suceeding.
We followed steps and cleared _raft and that did not help.
Steps that were taken are
1) Stop all SHC members.
2)Clean _raft on all nodes > $SPLUNK_HOME/var/run/splunk/_raft
3)restart all members
4)Attempted to bootstraped all using command
splunk bootstrap shcluster-captain -servers_list "<URI>:<management_port>,<URI>:<management_port>,..." -auth <username>:<password>
This failed with error SHPRaftConsensus - NOT_LEADER CURRENT_STATE = FOLLOWER
teh Splunkd.log has the folloing entries
01-05-2016 19:35:53.658 -0500 INFO ServerConfig - My server name is "test5421.xx.test.com".
01-05-2016 19:35:53.659 -0500 INFO ServerConfig - My hostname is "test5421".
01-05-2016 19:40:37.058 -0500 INFO SHPRaftConsensus - stepDown(1)
01-05-2016 19:40:37.058 -0500 INFO SHPRaftConsensus - Activating configuration 1:\n<configuration>\n<prev_configuration>\n<server>\n<server_id>https://test5421.xx.test.com:8089
01-05-2016 19:41:03.430 -0500 INFO SHPRaftConsensus - Running for election in term 2
01-05-2016 19:41:03.431 -0500 INFO SHPRaftConsensus - Now leader for term 2
01-05-2016 19:41:03.431 -0500 INFO SHPRaftConsensus - New commitIndex: 2
01-05-2016 19:41:03.431 -0500 INFO SHPoolingMgr - Making node the captain
01-05-2016 19:41:03.431 -0500 INFO SHPoolingMgr - makeOrChangeSlave - master_shp = https://test5421.xx.test.com:8089
01-05-2016 19:41:03.613 -0500 INFO SHPRaftConsensus - stepDown(7495)
01-05-2016 19:41:03.613 -0500 INFO SHPRaftConsensus - Activating configuration 1:\n<configuration>\n<prev_conf
iguration>\n<server>\n<server_id>https://test5421.xx.test.com:8089</server_id>\n</server>\n</prev_configuration>\n&...
01-05-2016 19:41:03.613 -0500 INFO SHPRaftConsensus - Exiting and deleting server : https://test5422.xx.test.com:8089
01-05-2016 19:41:03.613 -0500 INFO SHPRaftConsensus - Exiting and deleting server : https://testa9437.xx.test.com:8089
01-05-2016 19:41:03.613 -0500 INFO SHPRaftConsensus - Exiting and deleting server : https://test9453.xx.test.com:8089
01-05-2016 19:41:03.613 -0500 INFO SHPRaftConsensus - Exiting and deleting server : https://test9454.xx.test.com:8089
01-05-2016 19:41:03.613 -0500 INFO SHPoolingMgr - makeOrChangeSlave - master_shp = ?
01-05-2016 19:41:03.613 -0500 INFO SHPRaftConsensus - NOT_LEADER CURRENT_STATE = FOLLOWER
Note in the above log we see "stepDown(1)" and "stepDown(7495)" which does not seems right
It could be network issues leading to the failing in append entries while bootstrapping,--check in splunkd.log
Here is what worked::::
1) Stop all 5 SHC members.
2)lean _raft on all nodes > $SPLUNK_HOME/var/run/splunk/_raft. NOTE: It needs to be cleaned from all nodes.
3) restart all 5 SHC members
6)We initially bootstrapped one member
Bootstrap one node using command like below and then added peers using add peer on the captain bootstrapped
splunk bootstrap shcluster-captain -servers_list ":" -auth :
Here the reference to add peer:
http://docs.splunk.com/Documentation/Splunk/6.2.0/DistSearch/Addaclustermember#Add_the_instance
when you clear make sure all the nodes are stopped and turn off.
Can you try bootstrapping just one member and then keep adding peers using add peer on the captain bootstrapped