We are in the process of a full hardware upgrade of all our indexers in our distributed environment. We have three standalone search heads connected to a cluster of many indexers. In the process, we are proceeding one at a time: 1. Loading up a new indexer 2. Integrating it into the cluster 3. Taking an old indexer offline, enforcing counts When the decommissioning process finishes and the old indexers are gracefully shutdown, we have an alert that appears on our search heads in the Splunk Health Report: "The search head lost connection to the following peers: <decommissioned peer>. If there are unstable peers, confirm that the timeout (connectionTimeout and authTokenConnectionTimeout) settings in distsearch.conf are at appropriate values." I cannot figure out why we are seeing this alert. My conclusion is that we must be missing a step somewhere. To decommission a server, we do the following: 1. On the indexer: splunk offline enforce-counts 2. On the cluster master: splunk remove cluster-peers <GUID> 3. On the indexer: Completely uninstall Splunk. 3. On the cluster master: Rebalance indexes. We have also tried reloading the health.conf configuration by running '|rest /services/configs/conf-health.conf/_reload' on the search heads, to no effect. We cannot figure out where the health report is retaining this old data from, and the _internal logs clearly show that the moment of the GracefulShutdown transition on the Cluster Master is where the PeriodicHealthReporter component on the Search Heads begins to alert. The indexers in question are no longer listed as search peers on the search heads, and they're not listed as search peers on the cluster master either. The monitoring console looks fine. What could we be missing?
... View more