Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka Partitions Reassignment Performance Impact

I have a Kafka production cluster with 5 nodes and about 500 topics. I need to expand my cluster to add 2 new nodes and since, Kafka does not provide automatic data repartitioning, I am looking to run kafka-reassign-partitions.sh shipped along with Kafka distribution to rebalance all my topics in the overall 7 nodes in the cluster now.

Since I already have a large amount of production data in my cluster,

  1. Will running this script block any concurrent writes to my Kafka topics ?
  2. Will running this script slow down my cluster/producers/consumers ?
  3. How can I stop this script while it is in-progress in case my cluster starts misbehaving during this script's execution ?

I am currently using Kafka v0.8.2.0 with multiple producers and multiple consumers.

like image 666
Vijay Kansal Avatar asked Sep 12 '16 13:09

Vijay Kansal


1 Answers

What Kafka-reassign-partitions does is:

  1. Create new replicas on the new brokers as needed
  2. Have them replicate data until they catch up to the leader
  3. Trigger leader elections where needed
  4. Delete replicas where needed

The leader election phase will delay writes (like any leader failover). Consumers / producers may slow down because the extra replication takes disk and network resources (sometimes significant resources) You can't stop this while in progress. I mean, you can delete the relevant node from ZK, but it wasn't really tested and the new replicas created will stick around... I wouldn't try. If you are concerned, I recommend moving a partition at a time.

In 0.10.1.0 (now going to feature freeze), we'll add the capability to throttle the re-assignment work, which will limit the performance impact on producers and consumers.

like image 52
Gwen Shapira Avatar answered Oct 20 '22 12:10

Gwen Shapira