Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

validation for kafka topic messages

Tags:

apache-kafka

I'm working with kafka and I've been asked with doing a validation of the message that are sent to Kafka, but I don't like the solutions I've thought that's why I hope someone can advice me on this.

We have many producers outside our control, so they can send any message with any kind of format, and we could have as much as 80 million records sent, and they should be dealt in less than 2 hours. I'v been asked to:

  • Validate the format (Json since it has to be compatible with mongoDB).

  • Validate some of the fields sent.

  • Rename some fields

The last 2 requestes are to be done using parameters stored in MongoDB. All of this should be done assuming we are not the only one making the consumers, so there should be a "simple" call to our service that makes this validation. Any ideas?

like image 960
Giacomo Giovannini Avatar asked Jan 27 '23 16:01

Giacomo Giovannini


1 Answers

This is often done with a Kafka Streams job.

You have "raw" input topics where your producers send events. Then a Streams job reads from these topics and writes valid records into "clean" topics. In Streams you can do all sort of processing to check records or enrich them if needed.

You probably also want to write bad records into a dead letter queue topic so you can check why these happened.

Then your consumers can read from the clean topics to ensure they only see validated data.

This solution adds some latency to records as they have to be "processed" before being reaching consumers. You also want to run the Streams job close to the Kafka cluster as depending how much you want to validate, it might need to ingest large amount of data.

Also see Handling bad messages using Kafka's Streams API where some of these concepts are detailed.

like image 112
Mickael Maison Avatar answered Feb 05 '23 18:02

Mickael Maison