I am seeing some conflicting information about this in different places online, so would appreciate and authoritative answer from someone, who actually knows.
Suppose, I am serializing some stuff to avro:
class StuffToAvro {
private final Schema schema;
StuffToAvro(Schema schema) { this.schema = schema }
void apply(GenericRecord stuff, OutputStream out) {
final Encoder encoder = EncoderFactory.get.binaryEncoder(out, null);
final GenericDatumWriter writer = new GenericDatumWriter(schema);
writer.write(stuff, encoder):
}
}
The question is whether I can/should optimize it by reusing the encoder and writer, and, if I should, what is the right way to do it: can I just initialize the writer upfront and make it final
for example, or does it need to be a ThreadLocal
?
A similar question about encoder: should I remember the previous instance and pass it to getBinaryEncoder
to reuse, or does that need be a ThreadLocal
as well.
In each case, if the answer is ThreadLocal
, I'd also like to know whether such optimization is worth the complexity: is it actually expensive to create a brand new writer and/or encoder every time rather than reusing them?
Also, I assume, that whatever answers I get here, also apply to reading/decoding as well. Is that right?
Appreciate any pointers.
Thank you!
Per this post
Yes, a DatumReader instance may be used in multiple threads. Encoder and Decoder are not thread-safe, but DatumReader and DatumWriter are.
Writers are thread-safe too.
Yes, re-using a single GenericDatumWriter to write multiple objects should improve performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With