Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does kafka producer take a broker endpoint when being initialized instead of the zk

Tags:

apache-kafka

If I have multiple brokers, which broker should my producer use? Do I need to manually switch the broker to balance the load? Also why does the consumer only need a zookeeper endpoint instead of a broker endpoint?

quick example from tutorial:

> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test  > bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning 
like image 640
Erben Mo Avatar asked Mar 16 '14 23:03

Erben Mo


People also ask

What is Kafka broker EndPoint?

EndPoint is a data structure that represents an endpoint of a Kafka broker using the following properties: Host. Port. ListenerName. SecurityProtocol.

Why does Kafka decide to use a stateless broker?

Stateless broker This reduces the overhead for the broker and allows consumers to re-consume data by replaying messages in case of errors.

Why do we need broker in Kafka?

Kafka supports replication to support failover. Recall that Kafka uses ZooKeeper to form Kafka Brokers into a cluster and each node in Kafka cluster is called a Kafka Broker. Topic partitions can be replicated across multiple nodes for failover. The topic should have a replication factor greater than 1 (2, or 3).

Does Kafka producer connect to ZooKeeper?

ZooKeeper and Kafka For now, Kafka services cannot be used in production without first installing ZooKeeper. * This is true even if your use case requires just a single broker, single topic, and single partition. *Starting with v2. 8.0, Kafka can be run without ZooKeeper.


2 Answers

which broker should my producer use?
Do I need to manually switch the broker to balance the load?

Kafka runs on cluster, meaning set of nodes, so while producing anything you need to tell him the LIST of brokers that you've configured for your application, below is a small note taken from their documentation.

“metadata.broker.list” defines where the Producer can find a one or more Brokers to determine the Leader for each topic. This does not need to be the full set of Brokers in your cluster but should include at least two in case the first Broker is not available. No need to worry about figuring out which Broker is the leader for the topic (and partition), the Producer knows how to connect to the Broker and ask for the meta data then connect to the correct Broker.

Hope this clear some of your confusion

Also why does the consumer only need a zookeeper endpoint instead of a broker endpoint

This is not technically correct, as there are two types of APIs available, High level and Low level consumer.

The high level consumer basically takes care of most of the thing like leader detection, threading issue, etc. but does not provide much control over messages which exactly the purpose of using the other alternatives Simple or Low level consumer, in which you will see that you need to provide the brokers, partition related details.

So Consumer need zookeeper end point only when you are going with the high level API, in case of using Simple you do need to provide other information

like image 137
user2720864 Avatar answered Sep 23 '22 00:09

user2720864


Kafka sets a single broker as the leader for each partition of each topic. The leader is responsible for handling both reads and writes to that partition. You cannot decide to read or write from a non-Leader broker.

So, what does it mean to provide a broker or list of brokers to the kafka-console-producer ? Well, the broker or brokers you provide on the command-line are just the first contact point for your producer. If the broker you list is not the leader for the topic/partition you need, your producer will get the current leader info (called "topic metadata" in kafka-speak) and reconnect to other brokers as necessary before sending writes. In fact, if your topic has multiple partitions it may even connect to several brokers in parallel (if the partition leaders are different brokers).

Second q: why does the consumer require a zookeeper list for connections instead of a broker list? The answer to that is that kafka consumers can operate in "groups" and zookeeper is used to coordinate those groups (how groups work is a larger issue, beyond the scope of this Q). Zookeeper also stores broker lists for topics, so the consumer can pull broker lists directly from zookeeper, making an additional --broker-list a bit redundant.

like image 32
dpkp Avatar answered Sep 23 '22 00:09

dpkp