I have the following setup:
3 kafka brokers and a 3 zookeeper ensamble
1 topic with 12 partitions and 3 replicas (each kafka broker is thus the leader of 4 partitions)
I stop one of the brokers - it gets removed from the cluster, leadership of its partitions is moved to the two remaining brokers
I start the broker back - it reappears in the cluster, and eventually the leadership gets rebalanced so each broker is the leader of 4 partitions.
It works OK, except I find the time spent before the rebalancing too long (like minutes). This happens under no load - no messages are sent to the cluster, no messages are consumed.
Kafka version 0.9.0.0, zookeeper 3.4.6
zookeeper tickTime = 2000
kafka zookeeper.connection.timeout.ms = 6000
(basically the default config)
Does anyone know what config parameters in kafka and/or zookeeper influence the time taken for the leader rabalancing ?
kafka consumer rebalancing takes long time (from 3 secs to 5 minutes)
Safe Broker Restarts It needs to check all of its recent logs to make sure that it doesn't have any incomplete messages. It will also start replicating partitions from where it left off when it was restarted. Keep checking kafka topics --describe until all topic-partitions have all brokers in the isr.
Active controller In a Kafka cluster, one of the brokers serves as the controller, which is responsible for managing the states of partitions and replicas and for performing administrative tasks like reassigning partitions.
Each partition has one server which acts as the "leader" and zero or more servers which act as "followers". The leader handles all read and write requests for the partition while the followers passively replicate the leader. If the leader fails, one of the followers will automatically become the new leader.
as said in the official documentation http://kafka.apache.org/documentation.html#configuration (More details about broker configuration can be found in the scala class kafka.server.KafkaConfig.)
there actually is a leader.imbalance.check.interval.seconds
property which defaults to 300 (5 minutes), setting it to 30 seconds does what I need.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With