When defining a decoder to be provided to DatumReader classes, there is an option to reuse the decoder as you can see below. As mentioned in the doc, the decoder class is immutable and thread-safe therefore it does make sense to reuse it. What would be the best practice, would it be to reuse the decoder? Is there a performance overhead if we create decoder each time when we attempt to decode an Avro payload?
DatumReader<T> reader = new ReflectDatumReader<>(writerSchema, readerSchema);
// Second argument is a decoder to be reused
DecoderFactory.get().binaryDecoder(record, null);
reader.read(null, binaryDecoder);
This kind of optimization shouldn't be necessary, and likely won't actually improve performance. Let us know if you observe otherwise.
Basically, the Coder is created and serialized during pipeline submission as part of the job description. It is deserialized on each backend to create the instances for a given unit of work. These units of work are large enough that it shouldn't be creating tons of coders, so it shouldn't be a performance problem.
Additionally, AvroCoder serializes the schema, so the reflection won't be executing multiple times. Also, just sharing the instance (such as described) won't actually cause the deserialized instances to be shared.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With