In python 2.7, using Avro, I'd like to encode an object to a byte array.
All examples I've found write to a file.
I've tried using io.BytesIO() but this gives:
AttributeError: '_io.BytesIO' object has no attribute 'write_long'
Sample using io.BytesIO
def avro_encode(raw, schema):
writer = DatumWriter(schema)
avro_buffer = io.BytesIO()
writer.write(raw, avro_buffer)
return avro_buffer.getvalue()
fastavro is an alternative implementation that is much faster. It iterates over the same 10K records in 2.9sec, and if you use it with PyPy it'll do it in 1.5sec (to be fair, the JAVA benchmark is doing some extra JSON encoding/decoding).
Going from Avro to Pandas DataFrame is also a three-step process: Create a list to store the records — This list will store dictionary objects you can later convert to Pandas DataFrame. Read and parse the Avro file — Use fastavro. reader() to read the file and then iterate over the records.
Avro serializes the data which has a built-in schema. Avro serializes the data into a compact binary format, which can be deserialized by any application. Avro uses JSON format to declare the data structures. Presently, it supports languages such as Java, C, C++, C#, Python, and Ruby.
Your question helped me figure things out, so thanks. Here's a simple python example based on the python example in the docs:
import io
import avro.schema
import avro.io
test_schema = '''
{
"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
'''
schema = avro.schema.parse(test_schema)
writer = avro.io.DatumWriter(schema)
bytes_writer = io.BytesIO()
encoder = avro.io.BinaryEncoder(bytes_writer)
writer.write({"name": "Alyssa", "favorite_number": 256}, encoder)
writer.write({"name": "Ben", "favorite_number": 7, "favorite_color": "red"}, encoder)
raw_bytes = bytes_writer.getvalue()
print(len(raw_bytes))
print(type(raw_bytes))
bytes_reader = io.BytesIO(raw_bytes)
decoder = avro.io.BinaryDecoder(bytes_reader)
reader = avro.io.DatumReader(schema)
user1 = reader.read(decoder)
user2 = reader.read(decoder)
print(user1)
print(user2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With