Hi I have an architecture similar to the image shown below.
I have two kafka producer which will send messages to kafka topic with frequent duplicate messages.
Is there a way that I can handle the situation in a easy manner something like service bus topic.
Thank you for your help.
Assuming that you actually have multiple different producers writing the same messages, I can see these two options:
1) Write all duplicates to a single Kafka topic, then use something like Kafka Streams (or any other stream processor like Flink, Spark Streaming, etc.) to deduplicate the messages and write deduplicated results to a new topic.
Here's a great Kafka Streams example using state stores: https://github.com/confluentinc/kafka-streams-examples/blob/4.0.0-post/src/test/java/io/confluent/examples/streams/EventDeduplicationLambdaIntegrationTest.java
2) Make sure that duplicated messages have the same message key. After that you need to enable log compaction and Kafka will eventually get rid of the duplicates. This approach is less reliable, but if you tweak the compaction settings properly it might give you what you want.
Now, Apache Kafka supports exactly-once delivery: https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With