Avro schema resolution for evolving a field from a primitive to a union

Tags:

I'm working with Avro 1.7.0 using the its Java's generic representation API and I have a problem dealing with our current case of schema evolution. The scenario we're dealing with here is making a primitive-type field optional by changing the field to be a union of null and that primitive type.

I'm going to use a simple example. Basically, our schemas are:

Initial: A record with one field of type int
Second version: Same record, same field name but the type is now a union of null and int

According to the schema resolution chapter of Avro's spec, the resolution for such a case should be:

if reader's is a union, but writer's is not
The first schema in the reader's union that matches the writer's schema is recursively resolved against it. If none match, an error is signalled.

My interpretation is that we should resolve data serialized with the initial schema properly as int is part of the union in the reader's schema.

However, when running a test of reading back a record serialized with version 1 using the version 2, I get

org.apache.avro.AvroTypeException: Attempt to process a int when a union was expected.

Here's a test that shows exactly this:

@Test
public void testReadingUnionFromValueWrittenAsPrimitive() throws Exception {
    Schema writerSchema = new Schema.Parser().parse("{\n" +
            "    \"type\":\"record\",\n" +
            "    \"name\":\"NeighborComparisons\",\n" +
            "    \"fields\": [\n" +
            "      {\"name\": \"test\",\n" +
            "      \"type\": \"int\" }]} ");

    Schema readersSchema = new Schema.Parser().parse(" {\n" +
            "    \"type\":\"record\",\n" +
            "    \"name\":\"NeighborComparisons\",\n" +
            "    \"fields\": [ {\n" +
            "      \"name\": \"test\",\n" +
            "      \"type\": [\"null\", \"int\"],\n" +
            "      \"default\": null } ]  }");

    // Writing a record using the initial schema with the 
    // test field defined as an int
    GenericData.Record record = new GenericData.Record(writerSchema);
    record.put("test", Integer.valueOf(10));        
    ByteArrayOutputStream output = new ByteArrayOutputStream();
    JsonEncoder jsonEncoder = EncoderFactory.get().
       jsonEncoder(writerSchema, output);
    GenericDatumWriter<GenericData.Record> writer = new 
       GenericDatumWriter<GenericData.Record>(writerSchema);
    writer.write(record, jsonEncoder);
    jsonEncoder.flush();
    output.flush();

    System.out.println(output.toString());

    // We try reading it back using the second schema 
    // version where the test field is defined as a union of null and int
    JsonDecoder jsonDecoder = DecoderFactory.get().
        jsonDecoder(readersSchema, output.toString());
    GenericDatumReader<GenericData.Record> reader =
            new GenericDatumReader<GenericData.Record>(writerSchema, 
                readersSchema);
    GenericData.Record read = reader.read(null, jsonDecoder);

    // We should be able to assert that the value is 10 but it
    // fails on reading the record before getting here
    assertEquals(10, read.get("test"));
}

I would like to either know if my expectations are correct (this should resolve successfully right?) or where I'm not using avro properly to handle such a scenario.

693

asked Aug 21 '12 19:08

skippyjon

1 Answers

The expectations that migrating a primitive schema to a union of null and a primitive are correct.

The problem with the code above is with how the decoder is created. The decoder needs the writer's schema rather than the reader's schema.

Rather than doing this:

JsonDecoder jsonDecoder = DecoderFactory.get().
    jsonDecoder(readersSchema, output.toString());

It should be like this:

JsonDecoder jsonDecoder = DecoderFactory.get().
    jsonDecoder(writerSchema, output.toString());

Credits goes to Doug Cutting for the answer on avro's user mailing list: http://mail-archives.apache.org/mod_mbox/avro-user/201208.mbox/browser

answered Nov 07 '22 06:11

skippyjon

Related questions
                            
                                In search for a good Java ODE solver
                            
                                Unable to query with @MapsId and @Id, but works just with @Id
                            
                                Connection between requirements and code in code
                            
                                Scala actors inefficiency issue
                            
                                Wrap abel within a composite
                            
                                Dynamically resize JTextField in JTree Node
                            
                                Is it ok to have an empty subclass that extends an abstract class?
                            
                                Java throwing out of memory exception before it's really out of memory?
                            
                                EventListeners and custom gui components
                            
                                sql query execute with PreparedStatement
                            
                                how to select one cell from SWT table
                            
                                Fastest Java HashSet<Integer> library [closed]
                            
                                Children of org.eclipse.jdt.core.dom.ASTNode
                            
                                Get last element in arraylist
                            
                                Change Java application icon?
                            
                                How to pass parameter in value expression and method expression in JSF 2?
                            
                                Socket.connect() to 0.0.0.0: Windows vs. Mac
                            
                                Use of @see or @link in doxygen
                            
                                Changelog changelog.groovy not found
                            
                                What is the meaning of sparse file in java 7?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Avro schema resolution for evolving a field from a primitive to a union

Tags:

java

avro

skippyjon

People also ask

1 Answers

skippyjon

Recent Activity

Donate For Us