Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a partition leader in Apache Kafka?

Are kafka leaders partitions themselves or are they brokers? My initial understanding was that they were partitions which acted as read/write agents which then deffered their value to ISRs.

However recently I have been hearing them mentioned as though they happen at the "broker" level, hence my confusion.

I know there are other posts which aim to answer this question, but the answers there did not help.

like image 939
Matt Avatar asked Mar 24 '20 17:03

Matt


People also ask

What is partition leader in Kafka?

Kafka has the notion of leader and follower brokers. In Kafka, for each topic partition, one broker is chosen as the leader for the other brokers (the followers). One of the chief duties of the leader is to assign replication of topic partitions to the follower brokers.

What does leader mean in Kafka?

Posted On: Apr 16, 2022. Each partition in Kafka has one server that plays the role of a leader, while there can be none or more servers that act as followers. Leader performs the task of all read and write request, while the followers passively replicate the role of a leader.

What is a partition in Kafka?

Kafka's topics are divided into several partitions. While the topic is a logical concept in Kafka, a partition is the smallest storage unit that holds a subset of records owned by a topic . Each partition is a single log file where records are written to it in an append-only fashion.

What is group leader in Kafka?

Kafka Group Leader The group leader is responsible for executing rebalance activity. The group leader will take a list of current members, assign partitions to them and send it back to the coordinator. The Coordinator then communicates back to the members about their new partitions.


3 Answers

Some answers here are not absolutely correct so I would like to make it clearer.

Every partition has exactly one partition leader which handles all the read/write requests of that partition. (update: from Kafka 2.4.0, consumers are allowed to read from replicas)
If replication factor is greater than 1, the additional partition replications act as partition followers.
Kafka guarantees that every partition replica resides on a different broker (whether if it's the leader or a follower), so the maximum replication factor is the number of brokers in the cluster.

Every partition follower is reading messages from the partition leader (acts like a kind of consumer) and does not serve any consumers of that partition (only the partition leader serves read/writes).
A partition follower is considered in-sync if it's reading records from the partition leader without lagging behind and without losing connection to ZooKeeper (max lag default is 10 seconds and ZooKeeper timeout is 6 seconds, both are configurable).
If a partition follower is lagging behind or lost connection from ZooKeeper, it considered out-of-sync.
When a partition leader shuts down for any reason (e.g a broker shuts down), one of it's in-sync partition followers becomes the new leader.

The replication section in Kafka Documentation explains this in details.
Confluent also wrote a nice blog about this topic.

like image 54
Ofek Hod Avatar answered Nov 15 '22 10:11

Ofek Hod


tl;dr

Are kafka leaders partitions themselves or are they brokers?

The partition leader is a Kafka Broker.


Partition Leader

This is clearly mentioned in Kafka Docs:

Each partition has one server which acts as the "leader" and zero or more servers which act as "followers". The leader handles all read and write requests for the partition while the followers passively replicate the leader. If the leader fails, one of the followers will automatically become the new leader. Each server acts as a leader for some of its partitions and a follower for others so load is well balanced within the cluster.

Therefore, a partition leader is actually the broker that serves this purpose and is responsible for all read and write requests for this particular partition.


Partition Leader Election

The assignment of a leader for a particular partition happens during a process called partition leader election. This process happens when the topic/partition is created or when the partition leader (i.e. the broker) is unavailable for any reason.

Additionally, you can force preferred replica election by using Preferred Replica Leader Election Tool:

With replication, each partition can have multiple replicas. The list of replicas for a partition is called the "assigned replicas". The first replica in this list is the "preferred replica". When topic/partitions are created, Kafka ensures that the "preferred replica" for the partitions across topics are equally distributed amongst the brokers in a cluster. In an ideal scenario, the leader for a given partition should be the "preferred replica". This guarantees that the leadership load across the brokers in a cluster are evenly balanced. However, over time the leadership load could get imbalanced due to broker shutdowns (caused by controlled shutdown, crashes, machine failures etc). This tool helps to restore the leadership balance between the brokers in the cluster.

To do so, you have to run the following command:

bin/kafka-preferred-replica-election.sh --zookeeper localhost:12913/kafka --path-to-json-file topicPartitionList.json

where the content of topicPartitionList.json should look like the one below:

{
 "partitions":
  [
    {"topic": "topic1", "partition": 0},
    {"topic": "topic1", "partition": 1},
    {"topic": "topic1", "partition": 2},
    {"topic": "topic2", "partition": 0},
    {"topic": "topic2", "partition": 1}
  ]
}

How to find which broker serves as the partition leader

In order to find which broker serves as the partition leader and which serve as In-Sync Replicas (ISR), you have to run the following command:

kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic myTopic

and the output should be identical to the one below:

Topic:myTopic       PartitionCount:4        ReplicationFactor:1     Configs:
    Topic: myTopic      Partition: 0    Leader: 2       Replicas: 2     Isr: 2
    Topic: myTopic      Partition: 1    Leader: 3       Replicas: 3     Isr: 3
    Topic: myTopic      Partition: 2    Leader: 4       Replicas: 4     Isr: 4
    Topic: myTopic      Partition: 3    Leader: 0       Replicas: 0     Isr: 0
like image 36
Giorgos Myrianthous Avatar answered Nov 15 '22 10:11

Giorgos Myrianthous


Partition leader concept works, when Kafka topic have --replication-factor more then 1 (that also means our cluster must have broker count greater or equals to replication-factor).

In such scenario when ever producer push any message to topic's partition, the request first comes to partition's leader (among all replicated partition present on Kafka cluster). Which stores the message and first replicate the message on other follower partitions and then after sends acknowledge for the message to producer.

After completion above process only, particular message would be available for consumer to consume.

I recommend official link for more understanding.

like image 22
Yogi Avatar answered Nov 15 '22 10:11

Yogi