I am using kafka version 2.4.1(recently upgraded to 2.4.1 from 2.2.0) and noticed a strange problem.
Even though application(kafka streams) is down (there is no application which is running ) but the consumer group command returns the state as rebalancing. Our application runs as kubernetes pod.
root@bastion-0:# ./kafka-consumer-groups --describe --group groupname --bootstrap-server kafka-0.local:9094
Warning: Consumer group 'groupname' is rebalancing.
I have waited for some amount of time now(30 mins) and still the command reports 'rebalancing' even though application is down.
Even if i try to delete the group, it gives the following message.
root@bastion-0:/app/kafka_2.12-2.4.1/bin# ./kafka-consumer-groups.sh --delete --group group1 --bootstrap-server kafka.local:9094
Error: Deletion of some consumer groups failed:
* Group 'group1' could not be deleted due to: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.GroupNotEmptyException: The group is not empty.
root@bastion-0:/app/kafka_2.12-2.4.1/bin# ./kafka-consumer-groups.sh --delete --group group2 --bootstrap-server kafka.local:9094
Error: Deletion of some consumer groups failed:
* Group 'group2' could not be deleted due to: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.GroupNotEmptyException: The group is not empty.
When i look at the group members, there are members listed even though application is NOT running. Is it because of new rebalance protocol(cooperative rebalance) ?
From where does ./kafka-consumer-groups reads the group membership information. Does it save the member information if the application is down ?
Update:
I brought up the application with a different group name and it came up fine. I can describe the group also. Even then the old group is in 'rebalancing' state.
New Update Also, i found that group coordinator for all the groups was one of the node in kafka cluster and when i rebooted that node, the problem went away.
Question:
Where is group metadata stored ? Can be problem be related to corrupted zookeeper ?
Having a higher number of partitions will lead to decreased throughput per single partition. Also, having a higher number of app instances, restart of a single one will lead to smaller Kafka lag during rebalancing.
Consumer rebalance initiated when consumer requests to join a group or leave a group. The Group Leader receives a list of all active consumers from the Group Coordinator. Group Leader decides partition(s) assigned to each consumer by using PartitionAssigner.
During a rebalance event, every consumer that's still in communication with the group coordinator must revoke then regain its partitions, for all partitions within its assignment. More partitions to manage means more time to wait as all the consumers within the group take the time to manage those relationships.
If the consumer crashes or is shut down, its partitions will be re-assigned to another member, which will begin consumption from the last committed offset of each partition. If the consumer crashes before any offset has been committed, then the consumer which takes over its partitions will use the reset policy.
There are several causes for a consumer group rebalance to take place. A new consumer joins a consumer group, an existing consumer leaves a consumer group, or the broker thinks a consumer may have failed. As well as these, any other need for resources to be reassigned will trigger a rebalance.
During a rebalance message processing is paused, impacting throughput. This is the first of a two part article that details the behaviour of consumer group rebalance with the Apache Java client.
Consumer groups are an important characteristic of Kafka’s distributed message processing for managing consumers and facilitating the ability to scale up applications. They group together consumers for any one topic, and the partitions within the topic are assigned across these consumers.
Within the consumer group, consumers are assigned topic partitions from which to consume. Group membership is managed on the broker side, and partition assignment is managed on the client side. The broker has no knowledge of what the resources are and how they are assigned amongst the consumers.
This was raised as bug here issues.apache.org/jira/browse/KAFKA-9935 with duplicate https://issues.apache.org/jira/browse/KAFKA-9752
This now appears to be fixed since March for versions 2.2.3, 2.3.2, 2.4.2 and 2.5 and above so make sure to use an up to date version.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With