Deployment Architecture

[IndexerCluster] Impact on FIx-up task if the down indexer recoves during fix-up task recovery

rbal_splunk
Splunk Employee
Splunk Employee

Question about fix-up tasks and their scenario is indexer goes down so the CM starts to do the fix-up. In the event the indexer returns to service BEFORE the fix-up tasks are completed, does the CM cancel the fix-up tasks or complete them and you just have excess buckets?

Concern is bout the network throughput needed during recovery.

1 Solution

rbal_splunk
Splunk Employee
Splunk Employee

The CM will schedule at most 5 concurrent replication fixups and 3 concurrent searchable fixups per Indexer. Those that are already scheduled (5/3) won’t be canceled. However, all the remaining buckets that weren’t “scheduled” wont lead to excess buckets since the jobs wont be scheduled after the indexer recovers.

time1 - indexer A goes down with 1000 buckets.
time2 - CM starts scheduling jobs to fixup RF/SF. CM will at most schedule up to 5 RF/3 SF jobs per indexer. As these jobs complete, more will be scheduled.
time3 - indexer A comes back up.

if during time2, we scheduled and fixed 50 jobs, there’ll be 50 excess RF/SF copies. the rest of the 950+ that werent fixed wont have any excess…

View solution in original post

0 Karma

rbal_splunk
Splunk Employee
Splunk Employee

f network bandwidth is a concern, there is a new 7.2 setting that CAPS how much bandwidth each indexer uses for “fixup” operations.

server.conf

max_nonhot_rep_kBps = <integer>
* This is the maximum throughput (kB(Bytes)/s) for warm/cold/summary 
* replications on a specific source peer. Similar to forwarder's maxKBps 
* setting in the limits.conf file.
* This setting throttles total bandwidth consumption for all 
  outgoing non-hot replication connections from a given source peer. 
  It does not throttle at the 'per-replication-connection', per-target 
  level.
* This setting is reloadable without restart if manually updated on the 
  source peers by using the command "splunk edit cluster-config" 
  or by making the corresponding REST call. We don't recommend updating 
  this setting across all the peers using bundle push because: 
    1) The push requires a rolling restart, as do all bundle pushes 
       with the server.conf file change.
    2) You might want to set different values on different peers.
* If set to 0, signifies unlimited throughput.
* Default: 0
0 Karma

rbal_splunk
Splunk Employee
Splunk Employee

The CM will schedule at most 5 concurrent replication fixups and 3 concurrent searchable fixups per Indexer. Those that are already scheduled (5/3) won’t be canceled. However, all the remaining buckets that weren’t “scheduled” wont lead to excess buckets since the jobs wont be scheduled after the indexer recovers.

time1 - indexer A goes down with 1000 buckets.
time2 - CM starts scheduling jobs to fixup RF/SF. CM will at most schedule up to 5 RF/3 SF jobs per indexer. As these jobs complete, more will be scheduled.
time3 - indexer A comes back up.

if during time2, we scheduled and fixed 50 jobs, there’ll be 50 excess RF/SF copies. the rest of the 950+ that werent fixed wont have any excess…

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...