I had a few questions from Kafka. Please help me in understanding the problem. As per official documentation, each partition will have one unique sequential id which called offset.
How does the offset numbers will be generated i.e based on the message arrival into a partition or offset numbers will be generated whenever the partitions are created?
do the same offset ID/number generates/exists in another partition because each partition is independent each other?
If the same offset can be possible in another partition then, How consumer uniquely identifies the message across multiple partitions?
How does consumer know the particular offset belongs to a particular partition? Please let me understand in both situations like a message with key & without a key?
Offsets and Consumer PositionKafka maintains a numerical offset for each record in a partition. This offset acts as a unique identifier of a record within that partition, and also denotes the position of the consumer in the partition.
OFFSET IN KAFKA The offset is a unique id assigned to the partitions, which contains messages. The most important use is that it identifies the messages through id, which are available in the partitions. In other words, it is a position within a partition for the next message to be sent to a consumer.
As each message is received by Kafka, it allocates a message ID to the message. Kafka then maintains the message ID offset on a by consumer and by partition basis to track consumption. Kafka brokers keep track of both what is sent to the consumer and what is acknowledged by the consumer by using two offset values.
Yes this is correct. Message ordering is guaranteed only on the partition level. This means that if you have a topic with multiple partitions, messages on different partitions might have the same offset. Therefore, an offset has a true meaning only within a single partition (as you can also see in the picture below, which is taken from Kafka Docs).
3/4. The consumers are subscribed to topics, but behind the scenes they are subscribed to particular partitions (well, if you have a single consumer in the consumer group it will subscribe to all of the partitions). Therefore, when the consumer reads messages from a particular partition, it can uniquely identify messages using their unique offsets which are maintained throughout the partition. As I already mentioned, the message order is guaranteed only within a single partition.
Note that messages without key, will be evenly distributed across the partitions of the topic, in a round-robin fashion. On the other hand, messages with the same key will be stored in the same partition and hence, you can use the key to store and order messages having the same key. For example, if you need to process users and you'd like order guarantee for each distinct user, you can use userID
as a key, so that all the events of that user are stored in the same partition. Later on, you will be able to consume these user-specific messages, in the order they were originally received.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With