I am a new user to Kafka and have been trialling it for about 2-3 weeks now. I believe at the moment I have a good understand of how Kafka works for the most part, but after attempting to fit the API for my own Kafka consumer (this is obscure but I'm following the guidelines for the new KafkaConsumer that is supposed to be available for v 0.9, which is out on the 'trunk' repo atm) I've had latency issues consuming from a topic if I have multiple consumers with the same groupID.
In this setup, my console consistently logs issues regarding a 'rebalance triggering'. Do rebalances occur when I add new consumers to a consumer group and are they triggered in order to figure out which consumer instance in the same groupID will get which partitions or are rebalances used for something else entirely?
I also came across this passage from https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design and I just can't seem to understand it, so if someone could help me make sense of it that would be much appreciated:
Rebalancing is the process where a group of consumer instances (belonging to the same group) co-ordinate to own a mutually exclusive set of partitions of topics that the group is subscribed to. At the end of a successful rebalance operation for a consumer group, every partition for all subscribed topics will be owned by a single consumer instance within the group. The way rebalancing works is as follows. Every broker is elected as the coordinator for a subset of the consumer groups. The co-ordinator broker for a group is responsible for orchestrating a rebalance operation on consumer group membership changes or partition changes for the subscribed topics. It is also responsible for communicating the resulting partition ownership configuration to all consumers of the group undergoing a rebalance operation.
Kafka Rebalance happens when a new consumer is either added (joined) into the consumer group or removed (left). It becomes dramatic during application service deployment rollout, as multiple instances restarted at the same time, and rebalance latency significantly increasing.
Rebalancing is necessary for Kafka to work. It should not affect the application, but there are cases where rebalances have a huge impact. Thus, we want to reduce the number of unnecessary rebalancing.
When a new consumer joins a consumer group the set of consumers attempt to "rebalance" the load to assign partitions to each consumer. If the set of consumers changes while this assignment is taking place the rebalance will fail and retry. This setting controls the maximum number of attempts before giving up.
the command for this is: rebalance.max.retries and is set to 4 by default.
also, it might be happening if the following is true:
ZooKeeper session timeout. If the consumer fails to send a heartbeat to ZooKeeper for this period of time it is considered dead and a rebalance will occur.
Hope this helps!
Rebalance is the re-assignment of partition ownership among consumers within a given consumer group. Remember that every consumer in a consumer group is assigned one or more topic partitions exclusively.
A Rebalance happens when:
Being a group coordinator (one of the brokers in the cluster) and a group leader (the first consumer that joins a group) designated for a consumer group, Rebalance can be more or less described as follows:
This applies to Kafka 0.9, but I'm quite sure for newer versions is still valid.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With