Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka Offset and Partition identification

Tags:

apache-kafka

I had a few questions from Kafka. Please help me in understanding the problem. As per official documentation, each partition will have one unique sequential id which called offset.

  1. How does the offset numbers will be generated i.e based on the message arrival into a partition or offset numbers will be generated whenever the partitions are created?

  2. do the same offset ID/number generates/exists in another partition because each partition is independent each other?

  3. If the same offset can be possible in another partition then, How consumer uniquely identifies the message across multiple partitions?

  4. How does consumer know the particular offset belongs to a particular partition? Please let me understand in both situations like a message with key & without a key?

like image 228
Ravi Avatar asked Jul 16 '19 10:07

Ravi


People also ask

What is offset and partition in Kafka?

Offsets and Consumer PositionKafka maintains a numerical offset for each record in a partition. This offset acts as a unique identifier of a record within that partition, and also denotes the position of the consumer in the partition.

What is offset ID in Kafka?

OFFSET IN KAFKA The offset is a unique id assigned to the partitions, which contains messages. The most important use is that it identifies the messages through id, which are available in the partitions. In other words, it is a position within a partition for the next message to be sent to a consumer.

How does Kafka keep track of offset?

As each message is received by Kafka, it allocates a message ID to the message. Kafka then maintains the message ID offset on a by consumer and by partition basis to track consumption. Kafka brokers keep track of both what is sent to the consumer and what is acknowledged by the consumer by using two offset values.


1 Answers

  1. Each partition maintains the messages it has received in a sequential order where they are identified by an offset. This offset is a sequential number and it automatically generated and assigned to messages.

  1. Yes this is correct. Message ordering is guaranteed only on the partition level. This means that if you have a topic with multiple partitions, messages on different partitions might have the same offset. Therefore, an offset has a true meaning only within a single partition (as you can also see in the picture below, which is taken from Kafka Docs).

    enter image description here


3/4. The consumers are subscribed to topics, but behind the scenes they are subscribed to particular partitions (well, if you have a single consumer in the consumer group it will subscribe to all of the partitions). Therefore, when the consumer reads messages from a particular partition, it can uniquely identify messages using their unique offsets which are maintained throughout the partition. As I already mentioned, the message order is guaranteed only within a single partition.

Note that messages without key, will be evenly distributed across the partitions of the topic, in a round-robin fashion. On the other hand, messages with the same key will be stored in the same partition and hence, you can use the key to store and order messages having the same key. For example, if you need to process users and you'd like order guarantee for each distinct user, you can use userID as a key, so that all the events of that user are stored in the same partition. Later on, you will be able to consume these user-specific messages, in the order they were originally received.

like image 153
Giorgos Myrianthous Avatar answered Sep 18 '22 14:09

Giorgos Myrianthous