Hi,
I moved to a multi-site cluster yesterday and I'm not entirely sure that replication is actually working within the cluster. It may not be, or it may be the splunk commands aren't playing nicely with the new multi-site cluster feature.
This is my clustering stanza in server.conf on the master
[clustering]
mode = master
multisite = true
available_sites = site1,site2
site_replication_factor = origin:2, site1:1, site2:1, total:3
site_search_factor = origin:1, site1:1, site2:1, total:2
pass4SymmKey = <REDACTED>
search_factor = 2
replication_factor = 3
I have 2 peers each in 2 sites, with 1 search head in each site. All of the Splunk servers in the cluster are assigned sites in servers.conf. I want to have a full searchable copy in each site for search affinity, thus the site_search_factor above.
I suppose the first thing I should say is that I'm getting my information from the splunk show cluster-status --verbose command or from the cluster settings page on the master.
When all 4 peers are up my search factor is met but all indices except 2 only get 2/3 for replication factor. All the others have between 1-8 buckets missing for the third copy and it never catches up. If I take down 1 peer in any site then my search factor goes to 1/2 for some portion of the indices and never recovers. The replication factor in this case will either stay at 2/3 or go to 1/3, it varies.
What makes me think this may be the tools working strangely is that it never recovers, despite no replication errors in splunkd.log (although I'm not sure if there are replication messages to fix up at all) and if I then bring up the node I brought down in a site then take down the other node in that site I get the same result.
Maintenance mode is off on the master.
If it's any help, when I upgraded to a multi-site cluster I made a mistake and didn't enable maintenance mode on the master before bringing up each peer (I ran the command but didn't notice it asked for a login). I'm not sure if that's broken something.
While I'm at it, does anyone know when the splunk remove excess-buckets command will be enabled for multi-site clusters? I think I've pretty much got a searchable copy on every peer by now.
... View more