Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix Expected start-union. Got VALUE_NUMBER_INT when converting JSON to Avro on the command line?

I'm trying to validate a JSON file using an Avro schema and write the corresponding Avro file. First, I've defined the following Avro schema named user.avsc:

{"namespace": "example.avro",  "type": "record",  "name": "user",  "fields": [      {"name": "name", "type": "string"},      {"name": "favorite_number",  "type": ["int", "null"]},      {"name": "favorite_color", "type": ["string", "null"]}  ] } 

Then created a user.json file:

{"name": "Alyssa", "favorite_number": 256, "favorite_color": null} 

And then tried to run:

java -jar ~/bin/avro-tools-1.7.7.jar fromjson --schema-file user.avsc user.json > user.avro 

But I get the following exception:

Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_NUMBER_INT     at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)     at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)     at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)     at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)     at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)     at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)     at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:99)     at org.apache.avro.tool.Main.run(Main.java:84)     at org.apache.avro.tool.Main.main(Main.java:73) 

Am I missing something? Why do I get "Expected start-union. Got VALUE_NUMBER_INT".

like image 595
Emre Sevinç Avatar asked Dec 15 '14 13:12

Emre Sevinç


People also ask

How do I convert Avro to JSON in Java?

You can use either ConvertRecord or ConvertAvroToJSON to convert your incoming Avro data to JSON. If the incoming Avro files do not have a schema embedded in them, then you will have to provide it, either to an AvroReader (for ConvertRecord) or the "Avro schema" property (for ConvertAvroToJSON).

What is JSON encoded Avro?

Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format.

Can Avro be read as JSON?

Apache Avro ships with some very advanced and efficient tools for reading and writing binary Avro but their support for JSON to Avro conversion is unfortunately limited and requires wrapping fields with type declarations if you have some optional fields in your schema.


2 Answers

According to the explanation by Doug Cutting,

Avro's JSON encoding requires that non-null union values be tagged with their intended type. This is because unions like ["bytes","string"] and ["int","long"] are ambiguous in JSON, the first are both encoded as JSON strings, while the second are both encoded as JSON numbers.

http://avro.apache.org/docs/current/spec.html#json_encoding

Thus your record must be encoded as:

{"name": "Alyssa", "favorite_number": {"int": 7}, "favorite_color": null} 
like image 184
Emre Sevinç Avatar answered Sep 28 '22 19:09

Emre Sevinç


There is a new JSON encoder in the works that should address this common issue:

https://issues.apache.org/jira/browse/AVRO-1582

https://github.com/zolyfarkas/avro

like image 22
ppearcy Avatar answered Sep 28 '22 21:09

ppearcy