Here are the Kafka docs for public ConsumerRecords<K,V> poll(long timeout)
Fetch data for the topics or partitions specified using one of the subscribe/assign APIs. It is an error to not have subscribed to any topics or partitions before polling for data. On each poll, consumer will try to use the last consumed offset as the starting offset and fetch sequentially. The last consumed offset can be manually set through seek(TopicPartition, long) or automatically set as the last committed offset for the subscribed list of partitions
My question is who(Broker or consumer or zookeper) is responsible for maintaining the offset and where it is stored(memory or disc)? If consumer maintains it in memory, will consumer start reading it from beginning or consumer application need to persist in disc?
As the "Offsets and Consumer Position" section in the docs you referenced mentions, the offsets are stored by Kafka (the broker):
Kafka maintains a numerical offset for each record in a partition
Specifically, it stores them in an "internal" consumer offsets topic called "__consumer_offsets".
The "old consumer" api (deprecated in upcoming v0.11) allows you to chose to store offset in kafka or zookeeper.
Additionally, you are free to save offsets on the consumer side and always seek to those offsets at startup, if you so choose.
So, in summary, depending on your consumer api version and your preference, offsets can be stored on the broker or zookeeper and/or on the consumer side.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With