Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Offsets stored in Zookeeper or Kafka?

I'm a bit confused about where offsets are stored when using Kafka and Zookeeper. It seems like offsets in some cases are stored in Zookeeper, in other cases they are stored in Kafka.

What determines whether the offset is stored in Kafka or in Zookeeper? And what the pros and cons?

NB: Of course I could also store the offset on my own in some different data store but that is not part of the picture for this post.

Some more details about my setup:

  • I run these versions: KAFKA_VERSION="0.10.1.0", SCALA_VERSION="2.11"
  • I connect to Kafka/Zookeeper using kafka-node from my NodeJS application.
like image 857
Nikola Schou Avatar asked Dec 14 '16 07:12

Nikola Schou


People also ask

Does Kafka store offset?

Kafka maintains a numerical offset for each record in a partition. This offset acts as a unique identifier of a record within that partition, and also denotes the position of the consumer in the partition.

Where is offset saved in Kafka?

Offsets in Kafka are stored as messages in a separate topic named '__consumer_offsets' . Each consumer commits a message into the topic at periodic intervals.

What is offset in ZooKeeper?

Offset Storage - ZookeeperThese data can range from configuration data, status details , timestamps etc which help Zookeeper do what it does best. Each node is called a 'znode' in Zookeeper parlance.

What are offsets in Kafka?

OFFSET IN KAFKA The offset is a unique id assigned to the partitions, which contains messages. The most important use is that it identifies the messages through id, which are available in the partitions. In other words, it is a position within a partition for the next message to be sent to a consumer.


2 Answers

Older versions of Kafka (pre 0.9) store offsets in ZK only, while newer version of Kafka, by default store offsets in an internal Kafka topic called __consumer_offsets (newer version might still commit to ZK though).

The advantage of committing offsets to the broker is, that the consumer does not depend on ZK and thus clients only need to talk to brokers which simplifies the overall architecture. Also, for large deployments with a lot of consumers, ZK can become a bottleneck while Kafka can handle this load easily (committing offsets is the same thing as writing to a topic and Kafka scales very well here -- in fact, by default __consumer_offsets is created with 50 partitions IIRC).

I am not familiar with NodeJS or kafka-node -- it depend on the client implementation how offsets are committed.

Long story short: if you use brokers 0.10.1.0 you could commit offsets to topic __consumer_offsets. But it depends on your client, if it implements this protocol.

In more detail, it depends on your broker and client version (and which consumer API you are using), because older clients can talk to newer brokers. First, you need to have broker and client version 0.9 or larger to be able to write offsets into the Kafka topics. But if an older client is connecting to a 0.9 broker, it will still commit offsets to ZK.

For Java consumers:

It depends what consumer are using: Before 0.9 there are two "old consumer" namely "high level consumer" and "low level consumer". Both, commit offsets directly to ZK. Since 0.9, both consumers got merged into single consumer, called "new consumer" (it basically unifies low level and high level API of both old consumers -- this means, in 0.9 there a three types of consumers). The new consumer commits offset to the brokers (ie, the internal Kafka topic)

To make upgrading easier, there is also the possibility to "double commit" offsets using old consumer (as of 0.9). If you enable this via dual.commit.enabled, offsets are committed to ZK and the __consumer_offsets topic. This allows you to switch from old consumer API to new consumer API while moving you offsets from ZK to __consumer_offsets topic.

like image 141
Matthias J. Sax Avatar answered Sep 19 '22 10:09

Matthias J. Sax


It all depends on which consumer you're using. You should choose the right consumer based on your Kafka version.

for version 0.8 brokers use the HighLevelConsumer. The offsets for your groups are stored in zookeeper.

For brokers 0.9 and higher you should use the new ConsumerGroup. The offsets are stored with kafka brokers.

Keep in mind that HighLevelConsumer will still work with versions past 0.8 but they have been deprecated in 0.10.1 and support will probably go away soon. The ConsumerGroup has rolling migration options to help move from HighLevelConsumer if you were committed to using it.

like image 21
Xiaoxin Avatar answered Sep 21 '22 10:09

Xiaoxin