Deployment Architecture

After communication issues between cluster nodes getting replication factor not met for search and indexing

brent_weaver
Builder

We had some communication issues the past couple of days and not my index master node is telling me that my replication factor is not met for both indexing and searching, yet all nodes are up and running. I thought that this may take time so let it bake overnight and it is still in this state. Any help is much appreciated as I need to get this thing healthy and happy!

0 Karma

sowings
Splunk Employee
Splunk Employee

You've probably reached a point where this is no longer a problem. However, for posterity, I'll try to explain what happens:

  • Ordinarily, when new data arrives, the usual MO is that we create "streaming copies" of buckets as data flows into a hot bucket. These represent live copies of the data copied block by block to other indexers (up to the count of our "replication factor" (RF) as the data arrives.
  • The communication issues meant that when new data arrived for an index, an attempt to create a new bucket couldn't be communicated to either the cluster master or another indexer. This produces an orphaned bucket.
  • This bucket will remain in this state until it rolls to warm. When it's warm, it's no longer written to, and can easily be copied 1:1 to its peers to satisfy replication factor.

How to correct / rectify:

  • When an indexer joins the cluster (e.g. starting up), it provides a list of all of its data buckets to the cluster master.
  • If the CM sees a bucket that is new (to it) or doesn't yet meet RF, it will then kick off "non-streaming" copies to meet replication factor.
  • If the hot bucket has moved to warm already (before restarting the indexer with "orphaned" buckets), then triggering a 're-add' of the indexer may fix the situation. If the bucket hasn't been moved to warm, then forcing it to warm with the "roll-hot-buckets" trick will roll it from hot to warm, allowing it to be fixed up as a "non-streaming" copy.
  • Leaving the indexers alone will eventually let those hot buckets roll to warm (a number of parameters in indexes.conf govern this behavior), and once warm, they can be fixed up as "non-streaming" copies.
0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...