Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What determines Kafka consumer offset?

People also ask

How does offset work in Kafka?

OFFSET IN KAFKAThe offset is a unique id assigned to the partitions, which contains messages. The most important use is that it identifies the messages through id, which are available in the partitions. In other words, it is a position within a partition for the next message to be sent to a consumer.

What is consumer offset?

Consumer offset is used to track the messages that are consumed by consumers in a consumer group. A topic can be consumed by many consumer groups and each consumer group will have many consumers. A topic is divided into multiple partitions. A consumer in a consumer group is assigned to a partition.

Which consumer in Kafka will commit the current offset?

By default, as the consumer reads messages from Kafka, it will periodically commit its current offset (defined as the offset of the next message to be read) for the partitions it is reading from back to Kafka.

What is offset factor in Kafka?

The offset is a simple integer number that is used by Kafka to maintain the current position of a consumer. That's it. The current offset is a pointer to the last record that Kafka has already sent to a consumer in the most recent poll. So, the consumer doesn't get the same record twice because of the current offset.


It is a bit more complex than you described.
The auto.offset.reset config kicks in ONLY if your consumer group does not have a valid offset committed somewhere (2 supported offset storages now are Kafka and Zookeeper), and it also depends on what sort of consumer you use.

If you use a high-level java consumer then imagine following scenarios:

  1. You have a consumer in a consumer group group1 that has consumed 5 messages and died. Next time you start this consumer it won't even use that auto.offset.reset config and will continue from the place it died because it will just fetch the stored offset from the offset storage (Kafka or ZK as I mentioned).

  2. You have messages in a topic (like you described) and you start a consumer in a new consumer group group2. There is no offset stored anywhere and this time the auto.offset.reset config will decide whether to start from the beginning of the topic (earliest) or from the end of the topic (latest)

One more thing that affects what offset value will correspond to earliest and latest configs is log retention policy. Imagine you have a topic with retention configured to 1 hour. You produce 5 messages, and then an hour later you post 5 more messages. The latest offset will still remain the same as in previous example but the earliest one won't be able to be 0 because Kafka will already remove these messages and thus the earliest available offset will be 5.

Everything mentioned above is not related to SimpleConsumer and every time you run it, it will decide where to start from using the auto.offset.reset config.

If you use Kafka version older than 0.9, you have to replace earliest, latest with smallest,largest.


Just an update: From Kafka 0.9 and forth, Kafka is using a new Java version of the consumer and the auto.offset.reset parameter names have changed; From the manual:

What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted):

earliest: automatically reset the offset to the earliest offset

latest: automatically reset the offset to the latest offset

none: throw exception to the consumer if no previous offset is found for the consumer's group

anything else: throw exception to the consumer.

I spent some time to find this after checking the accepted answer, so I thought it might be useful for the community to post it.


Further more there's offsets.retention.minutes. If time since last commit is > offsets.retention.minutes, then auto.offset.reset also kicks in