Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Kafka order of messages with multiple partitions

Tags:

apache-kafka

As per Apache Kafka documentation, the order of the messages can be achieved within the partition or one partition in a topic. In this case, what is the parallelism benefit we are getting and it is equivalent to traditional MQs, isn't it?

like image 433
Rajan R.G Avatar asked Apr 23 '15 10:04

Rajan R.G


People also ask

Does Kafka maintain order across partitions?

First of all, Kafka only guarantees message ordering within a partition, not across partitions. This places a burden on the producers and consumers to follow certain Kafka design patterns to ensure ordering. For example, the ability to partition data by key and one consumer per partition.

How does Kafka consumer read from multiple partitions?

The consumers in a group divide the topic partitions as fairly amongst themselves as possible by establishing that each partition is only consumed by a single consumer from the group. When the number of consumers is lower than partitions, same consumers are going to read messages from more than one partition.

Is Kafka messages ordered?

At a high-level, Kafka gives the following guarantees: Messages sent by a producer to a particular topic partition will be appended in the order they are sent.


1 Answers

In Kafka the parallelism is equal to the number of partitions for a topic.

For example, assume that your messages are partitioned based on user_id and consider 4 messages having user_ids 1,2,3 and 4. Assume that you have an "users" topic with 4 partitions.

Since partitioning is based on user_id, assume that message having user_id 1 will go to partition 1, message having user_id 2 will go to partition 2 and so on..

Also assume that you have 4 consumers for the topic. Since you have 4 consumers, Kafka will assign each consumer to one partition. So in this case as soon as 4 messages are pushed, they are immediately consumed by the consumers.

If you had 2 consumers for the topic instead of 4, then each consumer will be handling 2 partitions and the consuming throughput will be almost half.

To completely answer your question, Kafka only provides a total order over messages within a partition, not between different partitions in a topic.

ie, if consumption is very slow in partition 2 and very fast in partition 4, then message with user_id 4 will be consumed before message with user_id 2. This is how Kafka is designed.

like image 121
Vishal John Avatar answered Sep 26 '22 13:09

Vishal John