Deployment Architecture

Search Head Clustering: Distributed search status is up, but why is replication status always initial for all search heads except the captain?

Kirk_Hsu
Explorer

I have three search heads (sh1 sh2 sh3) set up as a search head cluster and two indexers (in1 in2)
in1 and in2 peer with the three search heads.

sh1 is Captain
I checked the Distributed search status via the three search head's web UI and I found all status is up, only the Captain's Replication status is successful and all other Replication status is always Initial.

I found that Replication status will be successful when an instance becomes a Captain (example:if sh2 is captain now, it's Replication status will be successful )

I rebooted many times, but still the same.

1 Solution

bfernandez
Communicator

Support has confirmed that our concert about Replication status Initial is the expected behavior since the Search head captain is the one which always sends bundles to peers so its status is the only one shown as successful.

This behavior has changed from previous versions so they are going to request a Splunk doc change.

View solution in original post

Masa
Splunk Employee
Splunk Employee

http://docs.splunk.com/Documentation/Splunk/6.2.3/DistSearch/SHCarchitecture#Role_of_the_captain

This is by design at this point of v6.2.3 because a captain is the only Splunk instance sending a bundle replication.
Non-captain members in SHC just check connectivity to search peers without sending a search time knowledge bundle to peers.

Kirk_Hsu
Explorer

Thank you!!!

may i ask you is there have any document to find "Non-captain members in SHC just check connectivity to search peers without sending a search time knowledge bundle to peers."

0 Karma

Masa
Splunk Employee
Splunk Employee

Currently there is no exact phrase you're looking for. Captain is the only one sending bundle. This means non-captain won't send bundle. The rest is basically same as general distributed search from bundle replication point of view. The view shows peer status is Up or Down depending on connectivity.

0 Karma

bfernandez
Communicator

Support has confirmed that our concert about Replication status Initial is the expected behavior since the Search head captain is the one which always sends bundles to peers so its status is the only one shown as successful.

This behavior has changed from previous versions so they are going to request a Splunk doc change.

Kirk_Hsu
Explorer

Many thanks!!!!!!!!!!!!1

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

Can you confirm that your replication port is configured, and not the same as your management port. E.g., management port TCP/8089, SHC replication port is TCP/8999.

You can check this in the $splunk_home$/etc/system/local/server.conf.

After that, confirm that the replication port is open between all SHC members. On linux, you can use netcat to validate the connectivity between the hosts..

host1$ nc -z host2 8999
host1$ nc -z host3 8999

And do the same for the others.

Key point to take away is that RAFT elections for captain is done over the management ports, and replication is done over the replication port. These two cannot be the same, but at this time Splunk won't error out if you configure the replication port as the same as your management port.

0 Karma

Kirk_Hsu
Explorer

my replication port is not the same as my management port
but I use the same replication port for all member
Is this right?

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

Replication for SHC members should all be the same, correct. Note that this should be listed under the [shcluster] stanza in your server.conf file. Dont mistake the SHC replication port for the indexer replication port. These are different.

0 Karma

Kirk_Hsu
Explorer

So should I configure indexer replication port?

ThIs is my server.conf of SHC,should I move [replication_port://12345] under the [shclustering]?

[general]
pass4SymmKey = $1$F/pkP+Y9
serverName = SH1

[replication_port://12345]

[shclustering]
disabled = 0
mgmt_uri = https://SH1:8089
pass4SymmKey = $1$F/pkP+Y9
id = D7D50D48-6DAE-4A7E-AEEC-2855E373C56F

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

Make sure all of your Search Heads can talk to each other on 12345. You may have firewall / iptables /ipfw blocking this port as its not standard.

Next you need to check in the index=_internal splunkd.log for errors on the peers.

0 Karma

Kirk_Hsu
Explorer

yes,all Search Heads can talk to each other on 12345

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

If you have validated communication between nodes, then you need to start looking in the logs and see if there are any errors and troubleshoot from there.

When you push a package from the deployer, does it deploy successfully?

0 Karma

Kirk_Hsu
Explorer

I don't have deployer
is it necessary?

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

Yes. I recommend reading the documentation on architecture and requirements for SHC at : http://docs.splunk.com/Documentation/Splunk/6.2.3/DistSearch/SHCarchitecture

The deployer plays an active role in that when the SHC members startup, they need to connect and validate the package contents with what they have locally on disk vs what the captain and other members have.

That being said, in a lab environment, you can get away without having the deployer however this isnt supported and can cause some adverse effects...

0 Karma

Kirk_Hsu
Explorer

I try to add deployer and deploy successfully but replication status still initial except captain and deployer

0 Karma

bfernandez
Communicator

I am researching the same problem. Have you made some progress?

0 Karma

Kirk_Hsu
Explorer

no~~~I don't have any idea
Have you?

Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...