I have running an Apache Kafka cluster of five nodes, and I am using an Apache ZooKeeper cluster of three nodes.
In zookeeper.properties file:
server.1=zNode01:2888:3888
server.2=zNode02:2888:3888
server.3=zNode03:2888:3888
And in server.properties file:
zookeeper.connect=zNode01:2181,zNode02:2181,zNode03:2181
I want to add a new ZooKeeper node:
I need to add this new ZooKeeper IP to an existing ZooKeeper properties file and need to restart it OR there is another way to do it?
I need to add this new ZooKeeper IP to Kafka server.properties file and need to restart it OR there is another way to do it?
In general architectures, Kafka cluster shall be served by 3 ZooKeeper nodes, but if the size of deployment is huge, then it can be ramped up to 5 ZooKeeper nodes but that in turn will add load on the nodes as all nodes try to be in sync as all metadata related activities are handled by ZooKeeper.
Generally, a typical Kafka cluster will be well served by three ZooKeeper nodes. If a Kafka deployment is particularly large, then consider utilizing five ZooKeeper nodes.
Its a more involved than what @cricket_007 described. This would be a good read before you attempt to add a new member to the existing zookeeper cluster.
https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperReconfig.html
Focus specifically on "Modifying the current dynamic configuration" section.
Basically, these are the high level steps:
a) The new server has to be introduced to the leader. This is done by adding itself and "enough cluster information" in the zookeeper.properties file for the joiner to connect to the existing leader. The configuration doesn't need to be absolutely uptodate, but fresh enough to connect with the current leader. To do that you could just get zookeeper.properties file from one of the nodes in the cluster, append joiner information to it, and start the zookeeper server on the joiner node.
b) Note that the joiner being able to talk to the leader of the cluster doesn't make it a part of the cluster automatically. The zookeeper ensemble has to vote and decide upon adding the new node into the cluster. The status of the joiner currently is a non-voting follower, and if you look at the current configuration of zookeeper ensemble (via zkcli's "config" command), you will not see the new node listed in the ensemble.
c) Now, we use the zkcli's "reconfig" command to add the new node to the cluster either as a voting participant or an observer. Voting participant means that all the consensus decisions (Eg. whos the new leader, whether to commit a write etc) will involve all the voting participants (and not the observers). Observers are added primarily to increase the read throughput of the zookeeper ensemble without adding the extra overhead of involving them in the 2-phase commit for each write operation. The reconfig command also performs this 2-phase commit, where the leader gathers votes from all the voting participants whether the new node should be added to the cluster. If quorum of the existing participants agree, the new node is added to the cluster.
d) Now, executing the zkcli's config command will show the new node as part of the cluster, either as a voting participant or as an observer.
e) Lastly, you would want to update the server.properties file of kafka to close the loop. Even though this change might not be immediately needed, this would inform kafka server (which is a zookeeper client) of the availability of the new member in the zookeeper cluster, so that it can fallback to the newly added node during failure scenarios.
Hope the answer helps in understanding how dynamically adding a new node to zookeeper cluster works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With