Is there a way to convert a JSON string to an Avro without a schema definition in Python? Or is this something only Java can handle?
Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format.
An Avro schema is created using JSON format. JSON is short for JavaScript Object Notation, and it is a lightweight, text-based data interchange format that is intended to be easy for humans to read and write.
The JSON file is converted to CSV file using "dataframe. write. csv("path")" function.
It is based on a subset of the JavaScript Programming Language. Avro can be classified as a tool in the "Serialization Frameworks" category, while JSON is grouped under "Languages". Redsift, OTTLabs, and Mon Style are some of the popular companies that use JSON, whereas Avro is used by Liferay, LendUp, and BetterCloud.
I recently had the same problem, and I ended up developing a python package that can take any python data structure, including parsed JSON and store it in Avro without a need for a dedicated schema.
I tested it for python 3.
You can install it as pip3 install rec-avro
or see the code and docs at https://github.com/bmizhen/rec-avro
Usage Example:
from fastavro import writer, reader, schema
from rec_avro import to_rec_avro_destructive, from_rec_avro_destructive, rec_avro_schema
def json_objects():
return [{'a': 'a'}, {'b':'b'}]
# For efficiency, to_rec_avro_destructive() destroys rec, and reuses it's
# data structures to construct avro_objects
avro_objects = (to_rec_avro_destructive(rec) for rec in json_objects())
# store records in avro
with open('json_in_avro.avro', 'wb') as f_out:
writer(f_out, schema.parse_schema(rec_avro_schema()), avro_objects)
#load records from avro
with open('json_in_avro.avro', 'rb') as f_in:
# For efficiency, from_rec_avro_destructive(rec) destroys rec, and
# reuses it's data structures to construct it's output
loaded_json = [from_rec_avro_destructive(rec) for rec in reader(f_in)]
assert loaded_json == json_objects()
To convert a JSON string to json objects use json.loads('{"a":"b"}')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With