Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can a kafka consumer group freeze during a rebalance

Can a rolling deployment of a Kafka consumer group cause the group to freeze?

So let's consider this scenario,

  1. we start a rolling deployment
  2. one consumer leaves the group
  3. Kafka notices this and triggers a rebalance (hence consumption stops)
  4. rebalance happens but soon a new consumer wants to join
  5. also another consumer leaves
  6. again a new rebalance happens
  7. (loop till deployment is complete)

So if you have a large enough cluster and it takes some time for the deployment to get completed on one machine (which is usually the case), Will this lead to a complete freeze in consumption?

If yes, What are the strategies to do a consumer group update in production

like image 939
swayamraina Avatar asked Sep 20 '20 08:09

swayamraina


People also ask

What happens when Kafka rebalancing?

Kafka Rebalancing Consequences They will experience an interruption and subsequent lag between the latest messages consumed from the topic and the most recent messages available within the topic.

How much time Kafka rebalancing takes?

During the entire rebalancing process, i.e. as long as the partitions are not reassigned, consumers no longer process any data. By default, the rebalance timeout is fixed to 5 minutes which can be a very long period during which the increasing consumer-lag can become an issue.

How do you fix attempt to Heartbeat failed since a group is rebalancing?

If you want to prevent this from happening, you can either increase the timeout ( session.timeout.ms ), or make sure your consumer sends heartbeat more often ( heartbeat.interval.ms ). Heartbeats are basically embedded in poll() , thus, you need to make sure you call poll frequently enough.

What triggers Kafka rebalance?

Rebalance Triggers There are several causes for a consumer group rebalance to take place. A new consumer joins a consumer group, an existing consumer leaves a consumer group, or the broker thinks a consumer may have failed. As well as these, any other need for resources to be reassigned will trigger a rebalance.


1 Answers

Yes, that's definitely possible. There have been a number of recent improvements to mitigate the downtime during events like this. I'd recommend enabling one or both or the following features:

Static membership was added in 2.3 and can prevent a rebalance from occurring when a known member of the group is bounced. This requires both the client and the broker to be on version 2.3+

Incremental cooperative rebalancing enables the group to have faster rebalances AND allows individual members to continue consuming throughout the rebalance. You'll still see rebalances during a rolling deployment but they won't result in a complete freeze in consumption for the duration. This is completely client side so it will work with any brokers, but your clients should be on version 2.5.1+

like image 195
S Blee-G Avatar answered Sep 21 '22 15:09

S Blee-G