I don't see any failure while producing or consuming the data however there are bunch of duplicate messages in production. For a small topic which gets around 100k messages, there are ~4k duplicates though like I said there is no failure and on top of that there is no retry logic implemented or config value is set.
I also check offset values for those duplicate messages and each has distinct values which tells me that the issue is in producer.
Any help would be highly appreciated
Read more about message delivery in kafka:
https://kafka.apache.org/08/design.html#semantics
So effectively Kafka guarantees at-least-once delivery by default and allows the user to implement at most once delivery by disabling retries on the producer and committing its offset prior to processing a batch of messages. Exactly-once delivery requires co-operation with the destination storage system but Kafka provides the offset which makes implementing this straight-forward.
Probably you are looking for "exactly once delivery" like in jms
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIgetexactly-oncemessagingfromKafka?
There are two approaches to getting exactly once semantics during data production: 1. Use a single-writer per partition and every time you get a network error check the last message in that partition to see if your last write succeeded 2. Include a primary key (UUID or something) in the message and deduplicate on the consumer.
We implemented second point in our systems.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With