Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Zookeeper/Kafka retain offset for a consumer?

Is the offset a property of the topic/partition, or is it a property of a consumer?

If it's a property of a consumer, does that mean multiple consumers reading from the same partition could have different offsets?

Also what happens to a consumer if it goes down, how does Kafka know it's dealing with the same consumer when it comes back online? presumably a new client ID is generated so it wouldn't have the same ID as previously.

like image 430
yogibear Avatar asked Aug 11 '18 21:08

yogibear


2 Answers

In most cases it is a property of a Consumer Group. When writing the consumers, you normally specify the consumer group in the group.id parameter. This group ID is used to recover / store the latest offset from / in the special topic __consumer_offsets where it is stored directly in the Kafka cluster it self. The consumer group is used not only for the offset but also to ensure that each partition will be consumed only from a single client per consumer group.

However Kafka gives you a lot of flexibility - so if you need you can store the offset somewhere else and you can do it based on whatever criteria you want. But in most cases following the consumer group concept and storing the offset inside Kafka is the best thing you can do.

like image 64
Jakub Avatar answered Sep 21 '22 02:09

Jakub


Kafka identifies consumer based on group.id which is a consumer property and each consumer should have this property

A unique string that identifies the consumer group this consumer belongs to. This property is required if the consumer uses either the group management functionality by using subscribe(topic) or the Kafka-based offset management strategy

And coming to offset it is a consumer property and broker property, whenever consumer consumes messages from kafka topic it will submit offset (which means consumed this list of messages from 1 to 10) next time it will start consuming from 10, offset can be manually submitted or automatically submitted enable.auto.commit

If true the consumer's offset will be periodically committed in the background.

And each consumer group will have its offset, based on that kafka server identifies either new consumer or old consumer was restarted

like image 31
Deadpool Avatar answered Sep 22 '22 02:09

Deadpool