Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Uneven Distribution of messages in Kafka Partitions

I have a topic with 10 partitions, 1 consumer group with 4 consumers and worker size is 3.

I could see there is an uneven distribution of messages in the partitions, One partition is having so much data and another one is free.

How can I make my producer to evenly distribute the load into all the partitions, so that all partitions are being utilized properly?

like image 736
Pacifist Avatar asked Jun 17 '18 02:06

Pacifist


1 Answers

According to the JavaDoc comment in the DefaultPartitioner class itself, the default partitioning strategy is:

  • If a partition is specified in the record, use it.
  • If no partition is specified but a key is present choose a partition based on a hash of the key.
  • If no partition or key is present choose a partition in a round-robin fashion.

https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java

So here are two possible reasons that may be causing the uneven distribution, depending on whether you are specifying a key while producing the message or not:

  • If you are specifying a key and you are getting an uneven distribution using the DefaultPartitioner, the most apparent explanation would be that you are specifying the same key multiple times.

  • If you are not specifying a key and using the DefaultPartitioner, a non-obvious behavior could be happening. According to the above you would expect round-robin distribution of messages, but this is not necessarily the case. An optimization introduced in 0.8.0 could be causing the same partition to be used. Check this link for a more detailed explanation: https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified? .

like image 170
Thomas Kabassis Avatar answered Nov 11 '22 11:11

Thomas Kabassis