High availability: Splunk cluster across two datac...

svenemil · ‎08-09-2013

Hi all!

If I set up a Splunk cluster spanning two datacenters. Machine data sent to indexers local to the datacenter. I have to make sure a copy of the machine data is replicated to the other datacenter.

Reading Managing Indexers and Clusters manual, chapter How clustered indexing works states: “…you cannot currently specify which nodes will receive replicated data.”

On the other hand, server.conf.spec describes the option acceptFrom (replication_port stanza) which “Lists a set of networks or addresses to accept connections from.” My idea is to use this acl to allow connections only from indexers located in the remote datacenter and deny connections from all local indexers.

My question is: Can this acl be used to bypass the limitation “… you cannot currently specify which nodes will receive replicated data.”?

If not, does anyone know if/when this will be available?

How clustered indexing works: http://docs.splunk.com/Documentation/Splunk/5.0.4/Indexer/Howclusteredindexingworks

server.conf.spec: http://docs.splunk.com/Documentation/Splunk/5.0.4/admin/Serverconf

svasan_splunk · ‎09-17-2013

No, it will likely cause lots of problems.

The master node is the node that picks the target of replications either for a newly created -ie hot - bucket or for a warm/cold bucket. It has no concept of 2 different sets of nodes and picks targets at random out of one global pool. So for either hot or warm/cold, it could pick a node in the same data center as the target.

For a warm/cold bucket, if a replication of a bucket fails because of the acceptFrom, the master would then again schedule another replication for that bucket. This would be repeated until it finally by random choice picks 2 nodes in different data centers ( so with 2 data centers say 2 tries to get it right).

For a hot bucket, the situation is worse. The source (or originating node) rolls the hot bucket on replication failure. If you have RF=3, and you have 2 data centers, it is likely that of the 2 targets at least one of them is local. Since local replication is blocked the hot bucket replication to that target would fail. And so the source would roll that bucket. This would happen repeatedly since every hot bucket created is being replicated to 2 other nodes one of whom is likely local. End result: lots of small buckets which would very badly impact search.

High availability: Splunk cluster across two datacenters

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!