Avro: Reusing a decoder

Question

When defining a decoder to be provided to DatumReader classes, there is an option to reuse the decoder as you can see below. As mentioned in the doc, the decoder class is immutable and thread-safe therefore it does make sense to reuse it. What would be the best practice, would it be to reuse the decoder? Is there a performance overhead if we create decoder each time when we attempt to decode an Avro payload?

DatumReader<T> reader = new ReflectDatumReader<>(writerSchema, readerSchema);
// Second argument is a decoder to be reused
DecoderFactory.get().binaryDecoder(record, null);
reader.read(null, binaryDecoder);

Ben Chambers · Accepted Answer

This kind of optimization shouldn't be necessary, and likely won't actually improve performance. Let us know if you observe otherwise.

Basically, the Coder is created and serialized during pipeline submission as part of the job description. It is deserialized on each backend to create the instances for a given unit of work. These units of work are large enough that it shouldn't be creating tons of coders, so it shouldn't be a performance problem.

Additionally, AvroCoder serializes the schema, so the reflection won't be executing multiple times. Also, just sharing the instance (such as described) won't actually cause the deserialized instances to be shared.

Avro: Reusing a decoder

Tags:

java

avro

google-cloud-dataflow

user_1357

1 Answers

Ben Chambers

Recent Activity

Donate For Us

Avro: Reusing a decoder

Tags:

java

avro

google-cloud-dataflow

user_1357

1 Answers

Ben Chambers

Related questions

Recent Activity

Donate For Us