Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encode an object with Avro to a byte array in Python

In python 2.7, using Avro, I'd like to encode an object to a byte array.

All examples I've found write to a file.

I've tried using io.BytesIO() but this gives:

AttributeError: '_io.BytesIO' object has no attribute 'write_long'

Sample using io.BytesIO

def avro_encode(raw, schema):
    writer = DatumWriter(schema)
    avro_buffer = io.BytesIO()
    writer.write(raw, avro_buffer)
    return avro_buffer.getvalue()
like image 837
Grant Overby Avatar asked May 12 '14 16:05

Grant Overby


People also ask

What is Fastavro?

fastavro is an alternative implementation that is much faster. It iterates over the same 10K records in 2.9sec, and if you use it with PyPy it'll do it in 1.5sec (to be fair, the JAVA benchmark is doing some extra JSON encoding/decoding).

How do I read an Avro file in pandas?

Going from Avro to Pandas DataFrame is also a three-step process: Create a list to store the records — This list will store dictionary objects you can later convert to Pandas DataFrame. Read and parse the Avro file — Use fastavro. reader() to read the file and then iterate over the records.

How does Avro serialization work?

Avro serializes the data which has a built-in schema. Avro serializes the data into a compact binary format, which can be deserialized by any application. Avro uses JSON format to declare the data structures. Presently, it supports languages such as Java, C, C++, C#, Python, and Ruby.


1 Answers

Your question helped me figure things out, so thanks. Here's a simple python example based on the python example in the docs:

import io
import avro.schema
import avro.io

test_schema = '''
{
"namespace": "example.avro",
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}
'''

schema = avro.schema.parse(test_schema)
writer = avro.io.DatumWriter(schema)

bytes_writer = io.BytesIO()
encoder = avro.io.BinaryEncoder(bytes_writer)
writer.write({"name": "Alyssa", "favorite_number": 256}, encoder)
writer.write({"name": "Ben", "favorite_number": 7, "favorite_color": "red"}, encoder)

raw_bytes = bytes_writer.getvalue()
print(len(raw_bytes))
print(type(raw_bytes))

bytes_reader = io.BytesIO(raw_bytes)
decoder = avro.io.BinaryDecoder(bytes_reader)
reader = avro.io.DatumReader(schema)
user1 = reader.read(decoder)
user2 = reader.read(decoder)

print(user1)
print(user2)
like image 104
ppearcy Avatar answered Oct 25 '22 23:10

ppearcy