Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avro: Reusing a decoder

When defining a decoder to be provided to DatumReader classes, there is an option to reuse the decoder as you can see below. As mentioned in the doc, the decoder class is immutable and thread-safe therefore it does make sense to reuse it. What would be the best practice, would it be to reuse the decoder? Is there a performance overhead if we create decoder each time when we attempt to decode an Avro payload?

DatumReader<T> reader = new ReflectDatumReader<>(writerSchema, readerSchema);
// Second argument is a decoder to be reused
DecoderFactory.get().binaryDecoder(record, null);
reader.read(null, binaryDecoder);
like image 872
user_1357 Avatar asked Mar 13 '26 01:03

user_1357


1 Answers

This kind of optimization shouldn't be necessary, and likely won't actually improve performance. Let us know if you observe otherwise.

Basically, the Coder is created and serialized during pipeline submission as part of the job description. It is deserialized on each backend to create the instances for a given unit of work. These units of work are large enough that it shouldn't be creating tons of coders, so it shouldn't be a performance problem.

Additionally, AvroCoder serializes the schema, so the reflection won't be executing multiple times. Also, just sharing the instance (such as described) won't actually cause the deserialized instances to be shared.

like image 190
Ben Chambers Avatar answered Mar 14 '26 15:03

Ben Chambers