I have a kafka cluster of 3 nodes. When Node #3 dies, my _schemas
topic stops functioning properly and I see this:
kafka-topics --zookeeper localhost:2181 --topic _schemas --describe
Topic:_schemas PartitionCount:1 ReplicationFactor:2 Configs:cleanup.policy=compact
Topic: _schemas Partition: 0 Leader: -1 Replicas: 3,2 Isr: 2
So it seems that node #3 is dead and that is what Leader: -1
refers to. But why doesn't Kafka just continue working as usual, assigning Node #2
as the new leader and replicating the data to #1
so that we have 2 in sync replicas?
The error I see in the kafka logs:
kafka.common.NotAssignedReplicaException:
Leader 3 failed to record follower 2's position -1 since the replica is not
recognized to be one of the assigned replicas 3 for partition <loop over many partitions>
Whenever a new topic is created, Kafka runs it's leader election algorithm to figure out the preferred leader of a partition. The first replica will be the one that will be elected as a leader from the list of replicas.
The controller is one of the Kafka brokers that is also responsible for the task of electing partition leaders (in addition to the usual broker functionality).
If the leader fails, one of the followers will automatically become the new leader. Each server acts as a leader for some of its partitions and a follower for others so load is well balanced within the cluster.
I solved this problem by restarting the controller broker. every kafka cluster has a broker been elected to controller so it will coordinate leader election. our case is that controller stuck. in order to find which broker is controller, you can just go to your zkCli.sh to access into your zookeeper which your kafka cluster uses, and then do get /controller
, you will see brokerId there.
I fixed this easily by restarting the controller broker, good luck.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With