Read and write schema when using the python avro library

Tags:

The avro specification allows using different write and read schema provided they match. The specification further allows aliases to cater for differences between the read and write schema. The following python 2.7 tries to illustrate this.

import uuid
import avro.schema
import json
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter


write_schema = {
    "namespace": "example.avro",
    "type": "record",
    "name": "User",
    "fields": [
         {"name": "name", "type": "string"},
         {"name": "favorite_number", "type": ["int", "null"]},
         {"name": "favorite_color", "type": ["string", "null"]}
     ]
}
writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(write_schema))
writer.append({"name": "Alyssa", "favorite_number": 256})
writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"})
writer.close()

read_schema = {
    "namespace": "example.avro",
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "first_name", "type": "string", "aliases": ["name"]},
        {"name": "favorite_number", "type": ["int", "null"]},
        {"name": "favorite_color", "type": ["string", "null"]}
    ]
}

# 1. open avro and extract passport + data
reader = DataFileReader(open("users.avro", "rb"), DatumReader(write_schema, read_schema))
reader.close()

This code has the following error message:

/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 /Users/simonshapiro/python_beam/src/avrov_test.py
Traceback (most recent call last):
  File "/Users/simonshapiro/python_beam/src/avrov_test.py", line 67, in <module>
    writer.append({"name": "Alyssa", "favorite_number": 256})
  File "/Library/Python/2.7/site-packages/avro/datafile.py", line 196, in append
    self.datum_writer.write(datum, self.buffer_encoder)
  File "/Library/Python/2.7/site-packages/avro/io.py", line 768, in write
    if not validate(self.writers_schema, datum):
  File "/Library/Python/2.7/site-packages/avro/io.py", line 103, in validate
    schema_type = expected_schema.type
AttributeError: 'dict' object has no attribute 'type'

Process finished with exit code 1

When it is run without different schema using this line

reader = DataFileReader(open("users.avro", "rb"), DatumReader())

it works fine.

931

asked Jun 11 '17 19:06

user2302244

1 Answers

Well after some more work I have discovered that the schemas were not set up correctly. This code works as intended:

import uuid
import avro.schema
import json
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter


write_schema = avro.schema.parse(json.dumps({
    "namespace": "example.avro",
    "type": "record",
    "name": "User",
    "fields": [
         {"name": "name", "type": "string"},
         {"name": "favorite_number", "type": ["int", "null"]},
         {"name": "favorite_color", "type": ["string", "null"]}
     ]
}))

writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(), write_schema)
writer.append({"name": "Alyssa", "favorite_number": 256})
writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"})
writer.close()

read_schema = avro.schema.parse(json.dumps({
    "namespace": "example.avro",
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "first_name", "type": "string", "default": "", "aliases": ["name"]},
        {"name": "favorite_number", "type": ["int", "null"]},
        {"name": "favorite_color", "type": ["string", "null"]}
    ]
}))

# 1. open avro and extract passport + data
reader = DataFileReader(open("users.avro", "rb"), DatumReader(write_schema, read_schema))
new_schema = reader.get_meta("avro.schema")
users = []
for user in reader:
    users.append(user)
reader.close()

187

answered Oct 13 '22 18:10

user2302244

Related questions
                            
                                Django: Use TinyMCE 4 in admin interface
                            
                                VGG, perceptual loss in keras
                            
                                How can I change the value of a masked array in numpy?
                            
                                Unable to align bins in a histogram of datetime objects using the hist() function
                            
                                tkinter Checkbutton widget returning wrong boolean value
                            
                                ImportError: No module named 'pandas' Using Ubuntu
                            
                                Executing WHERE IN using bindparameters in Sqlalchemy/Postgres
                            
                                Load YAML preserving order
                            
                                concatenate (merge) layer keras with tensorflow
                            
                                seaborn clustermap: set colorbar ticks
                            
                                networkx maximal_matching() does not return maximum matching
                            
                                MongoEngine and dealing with "UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking"
                            
                                Changing the cursor shape in the interactive IPython consle depending on vi-mode
                            
                                What is the Meta key in Spyder shortcuts?
                            
                                Python: regex match across file chunk boundaries
                            
                                Is it more Pythonic (and/or performant) to use or to avoid coroutines when making coroutine tail calls in Python?
                            
                                Logistic regression on One-hot encoding
                            
                                Prevent PyCharm from opening the browser/new tab on run
                            
                                Use of OR operator in python lambda function
                            
                                The real solution for multiple inheritance with different init parameters

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Read and write schema when using the python avro library

Tags:

python

python-2.7

avro

user2302244

People also ask

1 Answers

user2302244

Recent Activity

Donate For Us