Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

who is responsible for offset maintenance?

Here are the Kafka docs for public ConsumerRecords<K,V> poll(long timeout)

Fetch data for the topics or partitions specified using one of the subscribe/assign APIs. It is an error to not have subscribed to any topics or partitions before polling for data. On each poll, consumer will try to use the last consumed offset as the starting offset and fetch sequentially. The last consumed offset can be manually set through seek(TopicPartition, long) or automatically set as the last committed offset for the subscribed list of partitions

My question is who(Broker or consumer or zookeper) is responsible for maintaining the offset and where it is stored(memory or disc)? If consumer maintains it in memory, will consumer start reading it from beginning or consumer application need to persist in disc?

like image 912
emilly Avatar asked Jun 18 '17 14:06

emilly


Video Answer


1 Answers

As the "Offsets and Consumer Position" section in the docs you referenced mentions, the offsets are stored by Kafka (the broker):

Kafka maintains a numerical offset for each record in a partition

Specifically, it stores them in an "internal" consumer offsets topic called "__consumer_offsets".

The "old consumer" api (deprecated in upcoming v0.11) allows you to chose to store offset in kafka or zookeeper.

Additionally, you are free to save offsets on the consumer side and always seek to those offsets at startup, if you so choose.

So, in summary, depending on your consumer api version and your preference, offsets can be stored on the broker or zookeeper and/or on the consumer side.

like image 55
Michal Borowiecki Avatar answered Oct 03 '22 16:10

Michal Borowiecki