Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Schema in Avro message

I see that the Avro messages have the schema embedded, and then the data in binary format. If multiple messages are sent and new avro files are getting created for every message, is not Schema embedding an overhead? So, does that mean, it is always important for the producer to batch up the messages and then write, so multiple messages writing into one avro file, just carry one schema? On a different note, is there an option to eliminate the schema embedding while serializing using the Generic/SpecificDatum writers?

like image 214
Roshan Fernando Avatar asked Oct 18 '25 23:10

Roshan Fernando


1 Answers

You are correct, there is an overhead if you write a single record, with the schema. This may seem wasteful, but in some scenarios the ability to construct a record from the data using this schema is more important than the size of the payload.

Also take into account that even with the schema included, the data is encoded in a binary format so is usually smaller than Json anyway.

And finally, frameworks like Kafka can plug into a Schema Registry, where rather than store the schema with each record, they store a pointer to the schema.

like image 160
sksamuel Avatar answered Oct 20 '25 13:10

sksamuel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!