Say you have this AVDL as a simplified example:
@namespace("example.avro")
protocol User {
record Man {
int age;
}
record Woman {
int age;
}
record User {
union {
Man,
Woman
} user_info;
}
}
in python you are not able to properly serialize objects stating the type because this syntax is not allowed:
{"user_info": {"Woman": {"age": 18}}}
and the only object that gets serialized is
{"user_info": {"age": 18}}
losing all the type information and the DatumWriter
picking usually the first record that matches the set of fields, in this case a Man
.
The above problem works perfectly well when using the Java API.
So, what am I doing wrong here? Is it possible that serialization and deserialization is not idempotent in Python's Avro implementation?
Apache Avro is one of those data serialization systems. Avro is a language independent, schema-based data serialization library. It uses a schema to perform serialization and deserialization. Moreover, Avro uses a JSON format to specify the data structure which makes it more powerful.
A union indicates that a field might have more than one data type. For example, a union might indicate that a field can be a string or a null. A union is represented as a JSON array containing the data types.
You are correct that the standard avro library has no way to specify which schema to use in cases like this. However, fastavro
(an alternative implementation) does have a way to do this. In that implementation, a record can be specified as a tuple where the first value is the schema name and the second value is the actual record data. The record would look like this:
{"user_info": ("Woman", {"age": 18})}
Here's and example script:
from io import BytesIO
from fastavro import writer
schema = {
"type": "record",
"name": "User",
"fields": [{
"name": "user_info",
"type": [
{
"type": "record",
"name": "Man",
"fields": [{
"name": "age",
"type": "int"
}]
},
{
"type": "record",
"name": "Woman",
"fields": [{
"name": "age",
"type": "int"
}]
}
]
}]
}
records = [{"user_info": ("Woman", {"age": 18})}]
bio = BytesIO()
writer(bio, schema, records)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With