i faced a question with load balancing in kafka. So, i created a topic with 10 partitions and created 2 consumers. The 10 partitions were divided and assigned to these consumers (5 partitions to the first and 5 to the second) and it works fine. Sometimes first consumer works, sometimes second.
But at one moment we can face a situation when for example second consumer receives a message and it takes time (for example 10 minutes) to handle this message.
So, my question is how kafka will decide to which partition store the message?
Round robin in this case i think is not a good idea, because messages in partitions that are handled by second consumer won't be handled until the second consumer finishes the long work.
Updated!
According to the @Milan Baran answer, the load is balanced on the producer side. But in this case, even if we provide a custom Partitioner
realization, it will be the same problem that the message that was stored in the partition which was assigned to the consumer that is doing long-term work, will not be processed until this consumer finishes its long-term work.
May be, there are additional load balancer somewhere else?
Load balancing with Kafka is a straightforward process and is handled by the Kafka producers by default. While it isn't traditional load balancing, it does spread out the message load between partitions while preserving message ordering.
For most implementations you want to follow the rule of thumb of 10 partitions per topic, and 10,000 partitions per Kafka cluster. Going beyond that amount can require additional monitoring and optimization.
Kafka Partitioning Partitioning takes the single topic log and breaks it into multiple logs, each of which can live on a separate node in the Kafka cluster. This way, the work of storing messages, writing new messages, and processing existing messages can be split among many nodes in the cluster.
Consumer partition assignmentWhenever a consumer enters or leaves a consumer group, the brokers rebalance the partitions across consumers, meaning Kafka handles load balancing with respect to the number of partitions per application instance for you. This is great—it's a major feature of Kafka.
The decision which partition should be used is not up to kafka, but the producer sending a message have to decide. Look at https://kafka.apache.org/documentation#producerconfigs
You can provide a partitioner class to decide which partition to pick.
partitioner.class
Partitioner class that implements the Partitioner interface. org.apache.kafka.clients.producer.internals.DefaultPartitioner
There is a description of the DefaultPartitioner strategy
/**
* The default partitioning strategy:
* <ul>
* <li>If a partition is specified in the record, use it
* <li>If no partition is specified but a key is present choose a partition based on a hash of the key
* <li>If no partition or key is present choose a partition in a round-robin fashion
*/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With