The avro specification allows using different write and read schema provided they match. The specification further allows aliases to cater for differences between the read and write schema. The following python 2.7 tries to illustrate this.
import uuid
import avro.schema
import json
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
write_schema = {
"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(write_schema))
writer.append({"name": "Alyssa", "favorite_number": 256})
writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"})
writer.close()
read_schema = {
"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "first_name", "type": "string", "aliases": ["name"]},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
# 1. open avro and extract passport + data
reader = DataFileReader(open("users.avro", "rb"), DatumReader(write_schema, read_schema))
reader.close()
This code has the following error message:
/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 /Users/simonshapiro/python_beam/src/avrov_test.py
Traceback (most recent call last):
File "/Users/simonshapiro/python_beam/src/avrov_test.py", line 67, in <module>
writer.append({"name": "Alyssa", "favorite_number": 256})
File "/Library/Python/2.7/site-packages/avro/datafile.py", line 196, in append
self.datum_writer.write(datum, self.buffer_encoder)
File "/Library/Python/2.7/site-packages/avro/io.py", line 768, in write
if not validate(self.writers_schema, datum):
File "/Library/Python/2.7/site-packages/avro/io.py", line 103, in validate
schema_type = expected_schema.type
AttributeError: 'dict' object has no attribute 'type'
Process finished with exit code 1
When it is run without different schema using this line
reader = DataFileReader(open("users.avro", "rb"), DatumReader())
it works fine.
Even if you install the correct Avro package for your Python environment, the API differs between avro and avro-python3 . As an example, for Python 2 (with avro package), you need to use the function avro. schema. parse but for Python 3 (with avro-python3 package), you need to use the function avro.
Avro uses a schema to structure the data that is being encoded. It has two different types of schema languages; one for human editing (Avro IDL) and another which is more machine-readable based on JSON.
Creating Avro Schemas type − This field comes under the document as well as the under the field named fields. In case of document, it shows the type of the document, generally a record because there are multiple fields. When it is field, the type describes data type.
Well after some more work I have discovered that the schemas were not set up correctly. This code works as intended:
import uuid
import avro.schema
import json
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
write_schema = avro.schema.parse(json.dumps({
"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}))
writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(), write_schema)
writer.append({"name": "Alyssa", "favorite_number": 256})
writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"})
writer.close()
read_schema = avro.schema.parse(json.dumps({
"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "first_name", "type": "string", "default": "", "aliases": ["name"]},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}))
# 1. open avro and extract passport + data
reader = DataFileReader(open("users.avro", "rb"), DatumReader(write_schema, read_schema))
new_schema = reader.get_meta("avro.schema")
users = []
for user in reader:
users.append(user)
reader.close()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With