I have implemented a simple Kafka Dead letter record processor. It works perfectly when using records produced from the Console producer. However I find that our Kafka Streams applications do not guarantee that producing records to the sink topics that the offsets will be incremented by 1 for each record produced. Dead Letter Processor Background: I have a scenario where records may be received before all data required to process it is published. When records are not matched for processing by the streams app they are move to a Dead letter topic instead of continue to flow down stream. When new data is published we dump the latest messages from the Dead letter topic back in to the stream application's source topic for reprocessing with the new data. The Dead Letter processor: <ul> <li>At the start of the run application records the ending offsets of each partition</li> <li>The ending offsets marks the point to stop processing records for a given Dead Letter topic to avoid infinite loop if reprocessed records return to Dead Letter topic.</li> <li>Application resumes from the last Offsets produced by the previous run via consumer groups.</li> <li>Application is using transactions and <code>KafkaProducer#sendOffsetsToTransaction</code> to commit the last produced offsets.</li> </ul> To track when all records in my range are processed for a topic's partition my service compares its last produced offset from the producer to the the consumers saved map of ending offsets. When we reach the ending offset the consumer pauses that partition via <code>KafkaConsumer#pause</code> and when all partitions are paused (meaning they reached the saved Ending offset)then calls it exits. The Kafka Consumer API States: <blockquote> Offsets and Consumer Position Kafka maintains a numerical offset for each record in a partition. This offset acts as a unique identifier of a record within that partition, and also denotes the position of the consumer in the partition. For example, a consumer which is at position 5 has consumed records with offsets 0 through 4 and will next receive the record with offset 5. </blockquote> The Kafka Producer API references the next offset is always +1 as well. <blockquote> Sends a list of specified offsets to the consumer group coordinator, and also marks those offsets as part of the current transaction. These offsets will be considered committed only if the transaction is committed successfully. The committed offset should be the next message your application will consume, i.e. lastProcessedMessageOffset + 1. </blockquote> But you can clearly see in my debugger that the records consumed for a single partition are anything but incremented 1 at a time... <img src="https://i.stack.imgur.com/owkWX.png" alt="enter image description here"> I thought maybe this was a Kafka configuration issue such as <code>max.message.bytes</code> but none really made sense. Then I thought perhaps it is from joining but didn't see any way that would change the way the producer would function. Not sure if it is relevant or not but all of our Kafka applications are using Avro and Schema Registry... Should the offsets always increment by 1 regardless of method of producing or is it possible that using Kafka streams API does not offer the same guarantees as the normal Producer Consumer clients? Is there just something entirely that I am missing?

It is not an official API contract that message offsets are increased by one, even if the JavaDocs indicate this (it seems that the JavaDocs should be updated). <ul> <li>If you don't use transactions, you get either at-least-once semantics or no guarantees (some call this at-most-once semantics). For at-least-once, records might be written twice and thus, offsets for two consecutive messages are not really increased by one as the duplicate write "consumes" two offsets.</li> <li>If you use transactions, each commit (or abort) of a transaction writes a commit (or abort) marker into the topic -- those transactional markers also "consume" one offset (this is what you observe).</li> </ul> Thus, in general you should not rely on consecutive offsets. The only guarantee you get is, that each offset is unique within a partition.

Kafka Streams does not increment offset by 1 when producing to topic

Tags:

java

apache-kafka

kafka-consumer-api

kafka-producer-api

apache-kafka-streams

I have implemented a simple Kafka Dead letter record processor.

It works perfectly when using records produced from the Console producer.

However I find that our Kafka Streams applications do not guarantee that producing records to the sink topics that the offsets will be incremented by 1 for each record produced.

Dead Letter Processor Background:

I have a scenario where records may be received before all data required to process it is published. When records are not matched for processing by the streams app they are move to a Dead letter topic instead of continue to flow down stream. When new data is published we dump the latest messages from the Dead letter topic back in to the stream application's source topic for reprocessing with the new data.

The Dead Letter processor:

At the start of the run application records the ending offsets of each partition
The ending offsets marks the point to stop processing records for a given Dead Letter topic to avoid infinite loop if reprocessed records return to Dead Letter topic.
Application resumes from the last Offsets produced by the previous run via consumer groups.
Application is using transactions and KafkaProducer#sendOffsetsToTransaction to commit the last produced offsets.

To track when all records in my range are processed for a topic's partition my service compares its last produced offset from the producer to the the consumers saved map of ending offsets. When we reach the ending offset the consumer pauses that partition via KafkaConsumer#pause and when all partitions are paused (meaning they reached the saved Ending offset)then calls it exits.

The Kafka Consumer API States:

Offsets and Consumer Position Kafka maintains a numerical offset for each record in a partition. This offset acts as a unique identifier of a record within that partition, and also denotes the position of the consumer in the partition. For example, a consumer which is at position 5 has consumed records with offsets 0 through 4 and will next receive the record with offset 5.

The Kafka Producer API references the next offset is always +1 as well.

Sends a list of specified offsets to the consumer group coordinator, and also marks those offsets as part of the current transaction. These offsets will be considered committed only if the transaction is committed successfully. The committed offset should be the next message your application will consume, i.e. lastProcessedMessageOffset + 1.

But you can clearly see in my debugger that the records consumed for a single partition are anything but incremented 1 at a time... enter image description here

I thought maybe this was a Kafka configuration issue such as max.message.bytes but none really made sense. Then I thought perhaps it is from joining but didn't see any way that would change the way the producer would function.

Not sure if it is relevant or not but all of our Kafka applications are using Avro and Schema Registry...

Should the offsets always increment by 1 regardless of method of producing or is it possible that using Kafka streams API does not offer the same guarantees as the normal Producer Consumer clients?

Is there just something entirely that I am missing?

496

asked Feb 11 '19 18:02

DVS

1 Answers

It is not an official API contract that message offsets are increased by one, even if the JavaDocs indicate this (it seems that the JavaDocs should be updated).

If you don't use transactions, you get either at-least-once semantics or no guarantees (some call this at-most-once semantics). For at-least-once, records might be written twice and thus, offsets for two consecutive messages are not really increased by one as the duplicate write "consumes" two offsets.
If you use transactions, each commit (or abort) of a transaction writes a commit (or abort) marker into the topic -- those transactional markers also "consume" one offset (this is what you observe).

Thus, in general you should not rely on consecutive offsets. The only guarantee you get is, that each offset is unique within a partition.

150

answered Sep 20 '22 14:09

Matthias J. Sax

Related questions
                            
                                Setting A Relative Path for a Keystore File
                            
                                Unable to connect to Kafka run in container from Spring Boot app run outside container
                            
                                Lambda expression and Optional how to return String value
                            
                                How to create FusedLocationProviderClient in a Service that can run untill stop by user?
                            
                                Unable to create SOAP connection factory: Provider com.sun.xml.internal.messaging.saaj.client.p2p.HttpSOAPConnectionFactory not found
                            
                                Null in functional interface with different type return
                            
                                Private Sorting Rule in a Stream Java
                            
                                Why does Spark's Word2Vec return a vector?
                            
                                Check which combinations of parameters are null in Java
                            
                                Why Spring Security permitAll() is not working with OAuth2.0?
                            
                                Java Card Object Instance in Transient Memory
                            
                                How to remove partially installed sdk in android studio
                            
                                How to load lazy content on Linkedin search page using selenium
                            
                                Updating rows in jOOQ with joins
                            
                                Bottom navigation On navigation item select listener not working
                            
                                Return Optional as it is if the returned Optional has value, else call another function [duplicate]
                            
                                How to make KeyDown and KeyUp on android device?
                            
                                Java implicit conversion between int and char [duplicate]
                            
                                Jackson - deserialize inner list of objects to list of one higher level
                            
                                How to run a Chromium Browser with Selenium?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With