Solved: Search Head Clustering: Distributed search status ...

Kirk_Hsu · ‎05-26-2015

I have three search heads (sh1 sh2 sh3) set up as a search head cluster and two indexers (in1 in2)
in1 and in2 peer with the three search heads.

sh1 is Captain
I checked the Distributed search status via the three search head's web UI and I found all status is up, only the Captain's Replication status is successful and all other Replication status is always Initial.

I found that Replication status will be successful when an instance becomes a Captain (example:if sh2 is captain now, it's Replication status will be successful )

I rebooted many times, but still the same.

bfernandez · ‎06-03-2015

Support has confirmed that our concert about Replication status Initial is the expected behavior since the Search head captain is the one which always sends bundles to peers so its status is the only one shown as successful.

This behavior has changed from previous versions so they are going to request a Splunk doc change.

View solution in original post

Masa · ‎06-04-2015

http://docs.splunk.com/Documentation/Splunk/6.2.3/DistSearch/SHCarchitecture#Role_of_the_captain

This is by design at this point of v6.2.3 because a captain is the only Splunk instance sending a bundle replication.
Non-captain members in SHC just check connectivity to search peers without sending a search time knowledge bundle to peers.

Kirk_Hsu · ‎06-04-2015

Thank you!!!

may i ask you is there have any document to find "Non-captain members in SHC just check connectivity to search peers without sending a search time knowledge bundle to peers."

Masa · ‎06-05-2015

Currently there is no exact phrase you're looking for. Captain is the only one sending bundle. This means non-captain won't send bundle. The rest is basically same as general distributed search from bundle replication point of view. The view shows peer status is Up or Down depending on connectivity.

bfernandez · ‎06-03-2015

Support has confirmed that our concert about Replication status Initial is the expected behavior since the Search head captain is the one which always sends bundles to peers so its status is the only one shown as successful.

This behavior has changed from previous versions so they are going to request a Splunk doc change.

Kirk_Hsu · ‎06-03-2015

Many thanks!!!!!!!!!!!!1

esix_splunk · ‎06-01-2015

Can you confirm that your replication port is configured, and not the same as your management port. E.g., management port TCP/8089, SHC replication port is TCP/8999.

You can check this in the $splunk_home$/etc/system/local/server.conf.

After that, confirm that the replication port is open between all SHC members. On linux, you can use netcat to validate the connectivity between the hosts..

host1$ nc -z host2 8999
host1$ nc -z host3 8999

And do the same for the others.

Key point to take away is that RAFT elections for captain is done over the management ports, and replication is done over the replication port. These two cannot be the same, but at this time Splunk won't error out if you configure the replication port as the same as your management port.

Kirk_Hsu · ‎06-01-2015

my replication port is not the same as my management port
but I use the same replication port for all member
Is this right?

esix_splunk · ‎06-01-2015

Replication for SHC members should all be the same, correct. Note that this should be listed under the [shcluster] stanza in your server.conf file. Dont mistake the SHC replication port for the indexer replication port. These are different.

Kirk_Hsu · ‎06-01-2015

So should I configure indexer replication port?

ThIs is my server.conf of SHC,should I move [replication_port://12345] under the [shclustering]?

[general]
pass4SymmKey = $1$F/pkP+Y9
serverName = SH1

[replication_port://12345]

[shclustering]
disabled = 0
mgmt_uri = https://SH1:8089
pass4SymmKey = $1$F/pkP+Y9
id = D7D50D48-6DAE-4A7E-AEEC-2855E373C56F

esix_splunk · ‎06-01-2015

Make sure all of your Search Heads can talk to each other on 12345. You may have firewall / iptables /ipfw blocking this port as its not standard.

Next you need to check in the index=_internal splunkd.log for errors on the peers.

Kirk_Hsu · ‎06-01-2015

yes,all Search Heads can talk to each other on 12345

esix_splunk · ‎06-01-2015

If you have validated communication between nodes, then you need to start looking in the logs and see if there are any errors and troubleshoot from there.

When you push a package from the deployer, does it deploy successfully?

Kirk_Hsu · ‎06-01-2015

I don't have deployer
is it necessary?

esix_splunk · ‎06-01-2015

Yes. I recommend reading the documentation on architecture and requirements for SHC at : http://docs.splunk.com/Documentation/Splunk/6.2.3/DistSearch/SHCarchitecture

The deployer plays an active role in that when the SHC members startup, they need to connect and validate the package contents with what they have locally on disk vs what the captain and other members have.

That being said, in a lab environment, you can get away without having the deployer however this isnt supported and can cause some adverse effects...

Kirk_Hsu · ‎06-01-2015

I try to add deployer and deploy successfully but replication status still initial except captain and deployer

bfernandez · ‎05-28-2015

I am researching the same problem. Have you made some progress?

Kirk_Hsu · ‎06-01-2015

no~~~I don't have any idea
Have you?

Search Head Clustering: Distributed search status is up, but why is replication status always initial for all search heads except the captain?

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms