Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avro field default values

Tags:

java

maven

avro

I am running into some issues setting up default values for Avro fields. I have a simple schema as given below:

data.avsc:

{  "namespace":"test",  "type":"record",  "name":"Data",  "fields":[     { "name": "id", "type": [ "long", "null" ] },     { "name": "value", "type": [ "string", "null" ] },     { "name": "raw", "type": [ "bytes", "null" ] }  ] } 

I am using the avro-maven-plugin v1.7.6 to generate the Java model.

When I create an instance of the model using: Data data = Data.newBuilder().build();, it fails with an exception:

org.apache.avro.AvroRuntimeException: org.apache.avro.AvroRuntimeException: Field id type:UNION pos:0 not set and has no default value.

But if I specify the "default" property,

{ "name": "id", "type": [ "long", "null" ], "default": "null" }, 

I do not get this error. I read in the documentation that first schema in the union becomes the default schema. So my question is, why do I still need to specify the "default" property? How else do I make a field optional?

And if I do need to specify the default values, how does that work for a union; do I need to specify default values for each schema in the union and how does that work in terms of order/syntax?

Thanks.

like image 533
Kesh Avatar asked Apr 08 '14 13:04

Kesh


People also ask

What is default in Avro schema?

Default Values and Logical Types Default Values is one of the use case of Union where we can have multiple field value to take different types. And in default every field in avro schema are not nullable. Example : Making middle_name as nullable { "name": "middle_name", "type": ["null", "string"], "default": null }

Does Avro support null values?

To model the set of fields within a schema, Avro supports the following primitive types: null: No value. boolean: Binary value.

What is Avro data type?

Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format.

Does order of fields matter in Avro schema?

Avro serializer/deserializers operate on fields in the order they are declared. Producers and Consumers must be on a compatible schema including the field order. Do not change the order of AVRO fields. All Producers and Consumers are must be updated at the same time if you change the field order.


1 Answers

The default value of a union corresponds to the first schema of the union (Source). Your union is defined as ["long", "null"] therefor the default value must be a long number. null is not a long number that is why you are getting an error.

If you still want to define null as a default value then put null schema first, i.e. change the union to ["null", "long"] instead.

like image 70
Y.H. Avatar answered Sep 20 '22 11:09

Y.H.