I have three search heads (sh1 sh2 sh3) set up as a search head cluster and two indexers (in1 in2)
in1 and in2 peer with the three search heads.
sh1 is Captain
I checked the Distributed search status via the three search head's web UI and I found all status is up, only the Captain's Replication status is successful and all other Replication status is always Initial.
I found that Replication status will be successful when an instance becomes a Captain (example:if sh2 is captain now, it's Replication status will be successful )
I rebooted many times, but still the same.
Support has confirmed that our concert about Replication status Initial is the expected behavior since the Search head captain is the one which always sends bundles to peers so its status is the only one shown as successful.
This behavior has changed from previous versions so they are going to request a Splunk doc change.
http://docs.splunk.com/Documentation/Splunk/6.2.3/DistSearch/SHCarchitecture#Role_of_the_captain
This is by design at this point of v6.2.3 because a captain is the only Splunk instance sending a bundle replication.
Non-captain members in SHC just check connectivity to search peers without sending a search time knowledge bundle to peers.
Thank you!!!
may i ask you is there have any document to find "Non-captain members in SHC just check connectivity to search peers without sending a search time knowledge bundle to peers."
Currently there is no exact phrase you're looking for. Captain is the only one sending bundle. This means non-captain won't send bundle. The rest is basically same as general distributed search from bundle replication point of view. The view shows peer status is Up or Down depending on connectivity.
Support has confirmed that our concert about Replication status Initial is the expected behavior since the Search head captain is the one which always sends bundles to peers so its status is the only one shown as successful.
This behavior has changed from previous versions so they are going to request a Splunk doc change.
Many thanks!!!!!!!!!!!!1
Can you confirm that your replication port is configured, and not the same as your management port. E.g., management port TCP/8089, SHC replication port is TCP/8999.
You can check this in the $splunk_home$/etc/system/local/server.conf.
After that, confirm that the replication port is open between all SHC members. On linux, you can use netcat to validate the connectivity between the hosts..
host1$ nc -z host2 8999
host1$ nc -z host3 8999
And do the same for the others.
Key point to take away is that RAFT elections for captain is done over the management ports, and replication is done over the replication port. These two cannot be the same, but at this time Splunk won't error out if you configure the replication port as the same as your management port.
my replication port is not the same as my management port
but I use the same replication port for all member
Is this right?
Replication for SHC members should all be the same, correct. Note that this should be listed under the [shcluster] stanza in your server.conf file. Dont mistake the SHC replication port for the indexer replication port. These are different.
So should I configure indexer replication port?
ThIs is my server.conf of SHC,should I move [replication_port://12345] under the [shclustering]?
[general]
pass4SymmKey = $1$F/pkP+Y9
serverName = SH1
[replication_port://12345]
[shclustering]
disabled = 0
mgmt_uri = https://SH1:8089
pass4SymmKey = $1$F/pkP+Y9
id = D7D50D48-6DAE-4A7E-AEEC-2855E373C56F
Make sure all of your Search Heads can talk to each other on 12345. You may have firewall / iptables /ipfw blocking this port as its not standard.
Next you need to check in the index=_internal splunkd.log for errors on the peers.
yes,all Search Heads can talk to each other on 12345
If you have validated communication between nodes, then you need to start looking in the logs and see if there are any errors and troubleshoot from there.
When you push a package from the deployer, does it deploy successfully?
I don't have deployer
is it necessary?
Yes. I recommend reading the documentation on architecture and requirements for SHC at : http://docs.splunk.com/Documentation/Splunk/6.2.3/DistSearch/SHCarchitecture
The deployer plays an active role in that when the SHC members startup, they need to connect and validate the package contents with what they have locally on disk vs what the captain and other members have.
That being said, in a lab environment, you can get away without having the deployer however this isnt supported and can cause some adverse effects...
I try to add deployer and deploy successfully but replication status still initial except captain and deployer
I am researching the same problem. Have you made some progress?
no~~~I don't have any idea
Have you?