I would like to deploy a Kafka cluster in two datacenters with the same number of nodes on each DC. The first DC is used in active mode while the second is in passive mode.
For example, let say that both datacenters have 3 nodes with 2 in-sync replica (ISR) on the first DC and one ISR on the second DC.
Is it possible to have a third DC containing an arbiter/witness/observer node such that in case of failure of one DC, a leader election can succeed with the correct outcome in term of consistency? mongoDB has such feature named Replica set Arbiter.
What about deploying ZooKeeper on the three datacenters? From my understanding ZooKeeper does not hold the Kafka data and it should not be contacted for each new record in the Kafka topic, i.e. you do not pay the latency to the third DC for each new record.
Whenever a new topic is created, Kafka runs it's leader election algorithm to figure out the preferred leader of a partition. The first replica will be the one that will be elected as a leader from the list of replicas.
In Kafka, an unclean leader election occurs when an unclean broker (“unclean” because it has not finished replicating the latest data updates from the previous leader) becomes the new leader.
The cluster is in a leader skewed state when a node is a leader for more partitions than the number of partitions/number of brokers. In order to solve this, Kafka has the facility of reassigning leaders to the preferred replicas. This can be done in two ways: The broker configuration auto.
Kafka and Quorum. Quorum is the number of acknowledgments required and the number of logs that must be compared to elect a leader such that there is guaranteed to be an overlap for availability. Most systems use a majority vote, Kafka does not use a simple majority vote to improve availability.
There is one presentation at the Kafka summit 2017 One Data Center is Not Enough: Scaling Apache Kafka Across Multiple Data Centers speaking about this setup. There is also some interesting information inside a Confluent whitepaper Disaster Recovery for Multi-Datacenter Apache Kafka® Deployments. It says it could work and they called it an observer node but it also says no one has ever tried this.
Zookeeper keeps tracks of the following metadata for Kafka (0.9.0+).
More detail on the dependency between Kafka and Zookeeper on the Kafka FAQ and answer at Quora from a Kafka commiter working at Confluent.
From the resources I have read, a setup with two DC (Kafka plus Zookeeper ) and an arbiter/witness/observer Zookeeper node on a third DC with high latency could work but I haven't found any resources that has actually experimented it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With