Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the reasoning behind Kafka Connect Schemas?

We are writing a custom sink connector for writing content of a topic with avro messages to a CEPH storage.

To do this we are provided with SinkRecords which have a Kafka Connect schema which is a mapped version of our avro schema. Since we want to write avro to CEPH, we use the connect API methods to convert the Connect schema back to Avro. Why do we need to do this? What are the benefits of introducing Kafka Connect Schema and not using the more commonly adapted Avro Schema?

FYI: I am asking this because we have some issues with Avro unions. Their mappings to the Kafka Connect Schema still has some issues, e.g. https://github.com/confluentinc/schema-registry/commit/31648f0d34b10c1b36e8ec6f4c1236ed3fe86495#diff-0a8d4f17f8d4a68f2f0d2dcd9211df84

like image 846
longliveenduro Avatar asked Jan 30 '23 13:01

longliveenduro


1 Answers

Kafka Connect defines its own schema structure because the framework isolates connectors from any knowledge of how the messages are serialized in Kafka. This makes it possible to use any connector with any Converter. Without this separation, then connectors would expect the messages to be serialized in a particular form, making them harder to reuse.

If you know that all messages are serialized with a specific Avro schema, you can always configure your sink connector to use the ByteArrayConverter for keys and values, and then your connector can deal with the messages in the serialized form.

However, be aware that if the messages are serialized using Confluents Avro serializer (or Avro Converter in a source connector), then the binary form of the keys and values will include the magic byte and Avro schema identifier in the leading byte(s). The remaining content of the byte arrays will be the Avro serialized form.

like image 120
Randall Hauch Avatar answered Feb 22 '23 23:02

Randall Hauch