I have a Kafka production cluster with 5 nodes and about 500 topics. I need to expand my cluster to add 2 new nodes and since, Kafka does not provide automatic data repartitioning, I am looking to run kafka-reassign-partitions.sh
shipped along with Kafka distribution to rebalance all my topics in the overall 7 nodes in the cluster now.
Since I already have a large amount of production data in my cluster,
I am currently using Kafka v0.8.2.0
with multiple producers and multiple consumers.
What Kafka-reassign-partitions does is:
The leader election phase will delay writes (like any leader failover). Consumers / producers may slow down because the extra replication takes disk and network resources (sometimes significant resources) You can't stop this while in progress. I mean, you can delete the relevant node from ZK, but it wasn't really tested and the new replicas created will stick around... I wouldn't try. If you are concerned, I recommend moving a partition at a time.
In 0.10.1.0 (now going to feature freeze), we'll add the capability to throttle the re-assignment work, which will limit the performance impact on producers and consumers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With