I am running into some issues setting up default values for Avro fields. I have a simple schema as given below:
data.avsc:
{ "namespace":"test", "type":"record", "name":"Data", "fields":[ { "name": "id", "type": [ "long", "null" ] }, { "name": "value", "type": [ "string", "null" ] }, { "name": "raw", "type": [ "bytes", "null" ] } ] }
I am using the avro-maven-plugin v1.7.6 to generate the Java model.
When I create an instance of the model using: Data data = Data.newBuilder().build();
, it fails with an exception:
org.apache.avro.AvroRuntimeException: org.apache.avro.AvroRuntimeException: Field id type:UNION pos:0 not set and has no default value.
But if I specify the "default" property,
{ "name": "id", "type": [ "long", "null" ], "default": "null" },
I do not get this error. I read in the documentation that first schema in the union becomes the default schema. So my question is, why do I still need to specify the "default" property? How else do I make a field optional?
And if I do need to specify the default values, how does that work for a union; do I need to specify default values for each schema in the union and how does that work in terms of order/syntax?
Thanks.
Default Values and Logical Types Default Values is one of the use case of Union where we can have multiple field value to take different types. And in default every field in avro schema are not nullable. Example : Making middle_name as nullable { "name": "middle_name", "type": ["null", "string"], "default": null }
To model the set of fields within a schema, Avro supports the following primitive types: null: No value. boolean: Binary value.
Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format.
Avro serializer/deserializers operate on fields in the order they are declared. Producers and Consumers must be on a compatible schema including the field order. Do not change the order of AVRO fields. All Producers and Consumers are must be updated at the same time if you change the field order.
The default value of a union corresponds to the first schema of the union (Source). Your union is defined as ["long", "null"]
therefor the default value must be a long number. null
is not a long number that is why you are getting an error.
If you still want to define null
as a default value then put null schema first, i.e. change the union to ["null", "long"]
instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With