Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fixing under replicated partitions in kafka

Tags:

apache-kafka

In our production environment, we often see that the partitions go under-replicated while consuming the messages from the topics. We are using Kafka 0.11. From the documentation what is understand is

Configuration parameter replica.lag.max.messages was removed. Partition leaders will no longer consider the number of lagging messages when deciding which replicas are in sync.

Configuration parameter replica.lag.time.max.ms now refers not just to the time passed since last fetch request from the replica, but also to time since the replica last caught up. Replicas that are still fetching messages from leaders but did not catch up to the latest messages in replica.lag.time.max.ms will be considered out of sync.

How do we fix this issue? What are the different reasons for replicas go out of sync? In our scenario, we have all the Kafka brokers in the single RACK of the blade servers and all are using the same network with 10GBPS Ethernet(Simplex). I do not see any reason for the replicas to go out of sync due to the network.

like image 789
wandermonk Avatar asked Jul 24 '18 05:07

wandermonk


People also ask

What is under replicated partitions Kafka?

The Under Replicated Partitions metric displays the number of partitions that do not have enough replicas to meet the desired replication factor.

What is difference between partition and replica of a topic in Kafka cluster?

Partitions are the way that Kafka provides redundancy.Kafka keeps more than one copy of the same partition across multiple brokers. This redundant copy is called a replica. If a broker fails, Kafka can still serve consumers with the replicas of partitions that failed broker owned.

Why do we have 3 replications in Kafka?

A replication factor is the number of copies of data over multiple brokers. The replication factor value should be greater than 1 always (between 2 or 3). This helps to store a replica of the data in another broker from where the user can access it.

How does Kafka handle replication?

In Kafka, a message stream is defined by a topic, divided into one or more partitions. Replication happens at the partition level and each partition has one or more replicas. The replicas are assigned evenly to different servers (called brokers) in a Kafka cluster. Each replica maintains a log on disk.


2 Answers

We faced the same issue:

Solution was:

  1. Restart the Zookeeper leader.
  2. Restart the broker\brokers that are not replicating some of the partitions.

No data lose.

The issue is due to a faulty state in ZK, there was an opened issue on ZK for this, don't remember the number.

like image 165
Doron Levi Avatar answered Oct 17 '22 15:10

Doron Levi


I faced the same issue on Kafka 2.0, On restart Kafka controller node everything caught-up on the replicas.

But still looking for the reasons why few partitions are under-replicated whereas the other partitions on the same nodes for the same topic works good, and this issue i see on a random partitions.

like image 41
Satish Bellapu Avatar answered Oct 17 '22 14:10

Satish Bellapu