Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How customer offsets are maintained in mirrored cluster in Kafka?

Lets say I have two Kafka clusters and I am using mirror maker to mirror the topics from one cluster to another. I understand consumer has an embedded producer to commit offsets to __consumer-offset topic in Kafka cluster. I need to know what will happen if primary Kafka cluster goes down? Do we sync the __consumer-offset topic as well? Because secondary cluster could have different number of brokers and other settings, I think.

Please tell how Kafka mirrored cluster takes care of consumer offset?

Does auto.offset.reset setting play a role here?

like image 386
Yogesh Gupta Avatar asked Jan 31 '17 05:01

Yogesh Gupta


People also ask

How offsets are maintained in Kafka?

Kafka maintains a numerical offset for each record in a partition. This offset acts as a unique identifier of a record within that partition, and also denotes the position of the consumer in the partition.

What determines Kafka consumer offset?

Consumer offset is recorded in Kafka so if the consumer processing the partition in the consumer group goes down and when the consumer comes back, the consumer will read the offset to start reading the messages from the topic from where it is left off. This avoids duplication in message consumption.

How do offsets work in Kafka?

The offset is a simple integer number that is used by Kafka to maintain the current position of a consumer. That's it. The current offset is a pointer to the last record that Kafka has already sent to a consumer in the most recent poll. So, the consumer doesn't get the same record twice because of the current offset.

How does Kafka Mirror Maker work?

Kafka MirrorMaker is a stand-alone tool for copying data between two Apache Kafka® clusters. It is little more than a Kafka consumer and producer hooked together. Data will be read from topics in the origin cluster and written to a topic with the same name in the destination cluster.


1 Answers

Update

Since Apache Kafka 2.7.0, MirrorMaker is able to replicate committed offsets. Cf https://cwiki.apache.org/confluence/display/KAFKA/KIP-545%3A+support+automated+consumer+offset+sync+across+clusters+in+MM+2.0

Original Answer

Mirror maker does not replicate offsets.

Furthermore, auto.offset.reset is completely unrelated to this, because it's a consumer setting that defines where a consumer should start reading for the case, that no valid committed offset is found at startup.

The reason for not mirroring offsets is basically, that they can be meaningless on the mirror cluster because it is not guaranteed, that the messages will have the same offsets in both cluster.

Thus, in fail over case, you need to figure out something "smart" by yourself. One way would be to remember the metadata timestamp of you last processed record. This allows you to "seek" based on timestamp on the mirror cluster to find an approximate offset there. (You will need Kafka 0.10 for this.)

like image 96
Matthias J. Sax Avatar answered Oct 09 '22 04:10

Matthias J. Sax