Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avro serialization: which parts are and aren't thread-safe?

I am seeing some conflicting information about this in different places online, so would appreciate and authoritative answer from someone, who actually knows.

Suppose, I am serializing some stuff to avro:

    class StuffToAvro {
       private final Schema schema;
       StuffToAvro(Schema schema) { this.schema = schema }

       void apply(GenericRecord stuff, OutputStream out) {
         final Encoder encoder = EncoderFactory.get.binaryEncoder(out, null);
         final GenericDatumWriter writer = new GenericDatumWriter(schema);
         writer.write(stuff, encoder):
       }
    }

The question is whether I can/should optimize it by reusing the encoder and writer, and, if I should, what is the right way to do it: can I just initialize the writer upfront and make it final for example, or does it need to be a ThreadLocal?

A similar question about encoder: should I remember the previous instance and pass it to getBinaryEncoder to reuse, or does that need be a ThreadLocal as well.

In each case, if the answer is ThreadLocal, I'd also like to know whether such optimization is worth the complexity: is it actually expensive to create a brand new writer and/or encoder every time rather than reusing them?

Also, I assume, that whatever answers I get here, also apply to reading/decoding as well. Is that right?

Appreciate any pointers.

Thank you!

like image 721
Dima Avatar asked Oct 17 '22 13:10

Dima


1 Answers

Per this post

Yes, a DatumReader instance may be used in multiple threads. Encoder and Decoder are not thread-safe, but DatumReader and DatumWriter are.

Writers are thread-safe too.

Yes, re-using a single GenericDatumWriter to write multiple objects should improve performance.

like image 136
ah.narayanan Avatar answered Oct 21 '22 03:10

ah.narayanan