As per the definition of "default" attribute in Avro docs: "A default value for this field, used when reading instances that lack this field (optional)."
This means that if the corresponding field is missing, the default value is taken.
But this does not seem to be the case. Consider the following student
schema:
{
"type": "record",
"namespace": "com.example",
"name": "Student",
"fields": [{
"name": "age",
"type": "int",
"default": -1
},
{
"name": "name",
"type": "string",
"default": "null"
}
]
}
Schema says that: if "age" field is missing, then consider value as -1. Likewise for "name" field.
Now, if I try to construct Student model, from the following JSON:
{"age":70}
I get this exception:
org.apache.avro.AvroTypeException: Expected string. Got END_OBJECT
at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:698)
at org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:227)
Looks like the default is NOT working as expected. So, What exactly is the role of default here ?
This is the code used to generate Student model:
Decoder decoder = DecoderFactory.get().jsonDecoder(Student.SCHEMA$, studentJson);
SpecificDatumReader<Student> datumReader = new SpecificDatumReader<>(Student.class);
return datumReader.read(null, decoder);
(Student
class is auto-generated by Avro compiler from student schema)
In this case, the type of the key1 field is Union (type: [null, string] in Avro Schema). If the key1 field in the source data is not transferred or the transferred value is null, null is automatically filled as the default value.
Avro serializer/deserializers operate on fields in the order they are declared. Producers and Consumers must be on a compatible schema including the field order. Do not change the order of AVRO fields.
The use of Avro schemas allows serialized values to be stored in a very space-efficient binary format. Each value is stored without any metadata other than a small internal schema identifier, between 1 and 4 bytes in size. One such reference is stored per key-value pair.
JSON Schema can describe a much broader set of data than Avro (Avro can only have strings in enums, for instance, while enums in JSON Schema can have any JSON value); but Avro has notions which are not available in JSON (property order in records, binary types).
I think there is some miss understanding around default values so hopefully my explanation will help to other people as well. The default value is useful to give a default value when the field is not present, but this is essentially when you are instancing an avro object (in your case calling datumReader.read
) but it does not allow read data with a different schema, this is why the concept of "schema registry" is useful for this kind of situations.
The following code works and allow read your data
Decoder decoder = DecoderFactory.get().jsonDecoder(Student.SCHEMA$, "{\"age\":70}");
SpecificDatumReader<Student> datumReader = new SpecificDatumReader<>(Student.class);
Schema expected = new Schema.Parser().parse("{\n" +
" \"type\": \"record\",\n" +
" \"namespace\": \"com.example\",\n" +
" \"name\": \"Student\",\n" +
" \"fields\": [{\n" +
" \"name\": \"age\",\n" +
" \"type\": \"int\",\n" +
" \"default\": -1\n" +
" }\n" +
" ]\n" +
"}");
datumReader.setSchema(expected);
System.out.println(datumReader.read(null, decoder));
as you can see, I am specifying the schema used to "write" the json input which does not contain the field "name", however (considering your schema contains a default value) when you print the records you will see the name with your default value
{"age": 70, "name": "null"}
Just in case, might or might not already know, that "null" is not really a null value is a string with value "null".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With