Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading Avro file gives AvroTypeException: missing required field error (even though the new field is declared null in schema)

Tags:

java

hadoop

avro

I am trying to deserialize/read an Avro file, the avro data file doesn't have the new field. Even though the new field is declared as null in schema, it is expected to be optional. But it still gives me error as mandatory.

Exception in thread "main" org.apache.avro.AvroTypeException: Found com.kiran.avro.User, expecting com.kiran.avro.User, missing required field loc

The AVRO schema declaration:

{"name": "loc", "type": ["string", "null"]}

Reading file using code:

DatumReader<User> userDatumReader = new SpecificDatumReader<User>(User.class);
DataFileReader<User> dataFileReader = new DataFileReader<User>(file, userDatumReader);

Is there any other way to declare an optional field?

Thanks for hints/suggestions !!

like image 600
KiranM Avatar asked Aug 04 '16 18:08

KiranM


People also ask

Does Avro support null values?

Apache Avro To model the set of fields within a schema, Avro supports the following primitive types: null: No value. boolean: Binary value.

What is Logicaltype in Avro schema?

Logical types specify a way of representing a high-level type as a base Avro type. For example, a date is specified as the number of days after the unix epoch (or before using a negative value). This enables extensions to Avro's type system without breaking binary compatibility.

Is namespace mandatory in Avro schema?

@johndcal A namespace is not required within the avro schema source in Schema Registry.

What is default in Avro schema?

Default Values and Logical Types Default Values is one of the use case of Union where we can have multiple field value to take different types. And in default every field in avro schema are not nullable. Example : Making middle_name as nullable { "name": "middle_name", "type": ["null", "string"], "default": null }


1 Answers

What is the content of "file"?

I might be wrong, but if you define a field in schema as {"name": "loc", "type": ["string", "null"]}, you still need to define a loc field, even for null. It should be something like "loc": null in the file.

Try adding "default" to this field declaration:

{"name" : "loc",
"type" :  ["null","string"] ,
"default" : null}

Then it should be possible to omit this field in file.

You can also see this question Avro: deserialize json - schema with optional fields for some additional info and examples.

like image 88
RadioLog Avatar answered Sep 28 '22 06:09

RadioLog