Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How kafka balances partitions load?

i faced a question with load balancing in kafka. So, i created a topic with 10 partitions and created 2 consumers. The 10 partitions were divided and assigned to these consumers (5 partitions to the first and 5 to the second) and it works fine. Sometimes first consumer works, sometimes second.

But at one moment we can face a situation when for example second consumer receives a message and it takes time (for example 10 minutes) to handle this message.

So, my question is how kafka will decide to which partition store the message?

Round robin in this case i think is not a good idea, because messages in partitions that are handled by second consumer won't be handled until the second consumer finishes the long work.

Updated!

According to the @Milan Baran answer, the load is balanced on the producer side. But in this case, even if we provide a custom Partitioner realization, it will be the same problem that the message that was stored in the partition which was assigned to the consumer that is doing long-term work, will not be processed until this consumer finishes its long-term work.

May be, there are additional load balancer somewhere else?

like image 952
D. Krauchanka Avatar asked Nov 11 '16 13:11

D. Krauchanka


People also ask

How does Kafka handle load balancing?

Load balancing with Kafka is a straightforward process and is handled by the Kafka producers by default. While it isn't traditional load balancing, it does spread out the message load between partitions while preserving message ordering.

How does Kafka determine partition count?

For most implementations you want to follow the rule of thumb of 10 partitions per topic, and 10,000 partitions per Kafka cluster. Going beyond that amount can require additional monitoring and optimization.

How do partitions work in Kafka?

Kafka Partitioning Partitioning takes the single topic log and breaks it into multiple logs, each of which can live on a separate node in the Kafka cluster. This way, the work of storing messages, writing new messages, and processing existing messages can be split among many nodes in the cluster.

What is partition rebalance in Kafka?

Consumer partition assignmentWhenever a consumer enters or leaves a consumer group, the brokers rebalance the partitions across consumers, meaning Kafka handles load balancing with respect to the number of partitions per application instance for you. This is great—it's a major feature of Kafka.


1 Answers

The decision which partition should be used is not up to kafka, but the producer sending a message have to decide. Look at https://kafka.apache.org/documentation#producerconfigs

You can provide a partitioner class to decide which partition to pick.

partitioner.class
Partitioner class that implements the Partitioner interface. org.apache.kafka.clients.producer.internals.DefaultPartitioner

There is a description of the DefaultPartitioner strategy

/**
 * The default partitioning strategy:
 * <ul>
 * <li>If a partition is specified in the record, use it
 * <li>If no partition is specified but a key is present choose a partition based on a hash of the key
 * <li>If no partition or key is present choose a partition in a round-robin fashion
 */
like image 145
Milan Baran Avatar answered Sep 19 '22 21:09

Milan Baran