Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka having duplicate messages

I don't see any failure while producing or consuming the data however there are bunch of duplicate messages in production. For a small topic which gets around 100k messages, there are ~4k duplicates though like I said there is no failure and on top of that there is no retry logic implemented or config value is set.

I also check offset values for those duplicate messages and each has distinct values which tells me that the issue is in producer.

Any help would be highly appreciated

like image 991
East2West Avatar asked Dec 02 '15 05:12

East2West


1 Answers

Read more about message delivery in kafka:

https://kafka.apache.org/08/design.html#semantics

So effectively Kafka guarantees at-least-once delivery by default and allows the user to implement at most once delivery by disabling retries on the producer and committing its offset prior to processing a batch of messages. Exactly-once delivery requires co-operation with the destination storage system but Kafka provides the offset which makes implementing this straight-forward.

Probably you are looking for "exactly once delivery" like in jms

https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIgetexactly-oncemessagingfromKafka?

There are two approaches to getting exactly once semantics during data production: 1. Use a single-writer per partition and every time you get a network error check the last message in that partition to see if your last write succeeded 2. Include a primary key (UUID or something) in the message and deduplicate on the consumer.

We implemented second point in our systems.

like image 51
Anatoly Deyneka Avatar answered Nov 15 '22 08:11

Anatoly Deyneka