Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert JSON string to Avro in Python?

Tags:

python

avro

Is there a way to convert a JSON string to an Avro without a schema definition in Python? Or is this something only Java can handle?

like image 629
Rolando Avatar asked Mar 13 '14 14:03

Rolando


People also ask

What is JSON encoded Avro?

Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format.

Is Avro JSON format?

An Avro schema is created using JSON format. JSON is short for JavaScript Object Notation, and it is a lightweight, text-based data interchange format that is intended to be easy for humans to read and write.

How do I convert JSON data to CSV in Pyspark?

The JSON file is converted to CSV file using "dataframe. write. csv("path")" function.

How is Avro different from JSON?

It is based on a subset of the JavaScript Programming Language. Avro can be classified as a tool in the "Serialization Frameworks" category, while JSON is grouped under "Languages". Redsift, OTTLabs, and Mon Style are some of the popular companies that use JSON, whereas Avro is used by Liferay, LendUp, and BetterCloud.


1 Answers

I recently had the same problem, and I ended up developing a python package that can take any python data structure, including parsed JSON and store it in Avro without a need for a dedicated schema.

I tested it for python 3.

You can install it as pip3 install rec-avro or see the code and docs at https://github.com/bmizhen/rec-avro

Usage Example:

from fastavro import writer, reader, schema
from rec_avro import to_rec_avro_destructive, from_rec_avro_destructive, rec_avro_schema

def json_objects():
    return [{'a': 'a'}, {'b':'b'}]

# For efficiency, to_rec_avro_destructive() destroys rec, and reuses it's
# data structures to construct avro_objects 
avro_objects = (to_rec_avro_destructive(rec) for rec in json_objects())

# store records in avro
with open('json_in_avro.avro', 'wb') as f_out:
    writer(f_out, schema.parse_schema(rec_avro_schema()), avro_objects)

#load records from avro
with open('json_in_avro.avro', 'rb') as f_in:
    # For efficiency, from_rec_avro_destructive(rec) destroys rec, and 
    # reuses it's data structures to construct it's output
    loaded_json = [from_rec_avro_destructive(rec) for rec in reader(f_in)]

assert loaded_json == json_objects()

To convert a JSON string to json objects use json.loads('{"a":"b"}')

like image 199
boriska Avatar answered Sep 28 '22 00:09

boriska