Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Zookeeper a must for Kafka? [closed]

In Kafka, I would like to use only a single broker, single topic and a single partition having one producer and multiple consumers (each consumer getting its own copy of data from the broker). Given this, I do not want the overhead of using Zookeeper; Can I not just use the broker only? Why is a Zookeeper must?

like image 216
Paaji Avatar asked Oct 09 '22 09:10

Paaji


People also ask

Why Zookeeper is required for Kafka?

ZooKeeper is used in distributed systems for service synchronization and as a naming registry. When working with Apache Kafka, ZooKeeper is primarily used to track the status of nodes in the Kafka cluster and maintain a list of Kafka topics and messages.

Does Kafka consumer need Zookeeper?

For the latest version (2.4. 1) ZooKeeper is still required for running Kafka, but in the near future, ZooKeeper dependency will be removed from Apache Kafka. See the high-level discussion in KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum.

What happens if Zookeeper goes down in Kafka?

If Zookeeper is down while all of this happens, broker ISR list becomes inaccurate. In theory, as long as no changes occur on the brokers and as long as all the brokers are alive, clients will have NO impact while administrators work on bringing up the Zk quorum.

Does Kafka 3.2 need Zookeeper?

This means you can now run a secure Kafka cluster without Zookeeper!


1 Answers

Yes, Zookeeper is required for running Kafka. From the Kafka Getting Started documentation:

Step 2: Start the server

Kafka uses zookeeper so you need to first start a zookeeper server if you don't already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node zookeeper instance.

As to why, well people long ago discovered that you need to have some way to coordinating tasks, state management, configuration, etc across a distributed system. Some projects have built their own mechanisms (think of the configuration server in a MongoDB sharded cluster, or a Master node in an Elasticsearch cluster). Others have chosen to take advantage of Zookeeper as a general purpose distributed process coordination system. So Kafka, Storm, HBase, SolrCloud to just name a few all use Zookeeper to help manage and coordinate.

Kafka is a distributed system and is built to use Zookeeper. The fact that you are not using any of the distributed features of Kafka does not change how it was built. In any event there should not be much overhead from using Zookeeper. A bigger question is why you would use this particular design pattern -- a single broker implementation of Kafka misses out on all of the reliability features of a multi-broker cluster along with its ability to scale.

like image 183
John Petrone Avatar answered Nov 09 '22 19:11

John Petrone