Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Balancing Kafka consumers

Let's say that I have 10 partitions for a given topic in Kafka. What would my options be to automatically load balance these 10 partitions between consumers?

I have read this post https://stackoverflow.com/a/28580363/317384 but I'm not sure it covers what I'm looking for, or maybe I'm just not getting it.

If I spin up a worker with one consumer for each partition, all work would be consumed by that worker.

But what happens if I spin up another instance of the same worker elsewhere? Will the client libraries/Kafka somehow detect this and re-balance the load between the two workers so that some of the active consumers on worker1 are now idle and the same consumers on worker2 becomes active?

I would like to be able to add and remove workers on demand, and spread the load across those, is that possible?

e.g. from this: enter image description here

to this: enter image description here

like image 436
Roger Johansson Avatar asked Oct 30 '16 06:10

Roger Johansson


People also ask

What is rebalancing consumers in Kafka?

During a rebalance event, every consumer that's still in communication with the group coordinator must revoke then regain its partitions, for all partitions within its assignment. More partitions to manage means more time to wait as all the consumers within the group take the time to manage those relationships.

How does Kafka rebalance work?

Kafka Rebalance happens when a new consumer is either added (joined) into the consumer group or removed (left). It becomes dramatic during application service deployment rollout, as multiple instances restarted at the same time, and rebalance latency significantly increasing.

How does Kafka deal with multiple consumers?

You can't have multiple consumers that belong to the same group in one thread and you can't have multiple threads safely use the same consumer. One consumer per thread is the rule. To run multiple consumers in the same group in one application, you will need to run each in its own thread.


1 Answers

Kafka consumers are part of consumer groups. A group has one or more consumers in it. Each partition gets assigned to one consumer. And partitions are how Kafka scales out. If you have more consumers than partitions, then some of your consumers will be idle. If you have more partitions than consumers, more than one partition may get assigned to a single consumer.

When a new consumer joins, a rebalance occurs, and the new consumer is assigned some partitions previously assigned to other consumers. In your case, if there were 10 partitions all being consumed by one consumer, and another consumer joins, there'll be a rebalance, and afterwards, there'll be (typically) five partitions per consumer.

It's worth noting that during a rebalance, the consumer group "pauses". A similar thing happens when consumers gracefully leave, or the leader detects that a consumer has left.

like image 159
ashic Avatar answered Sep 30 '22 19:09

ashic