We have been researching this for hours now, with no luck, there are many ways to serialise and deserialise objects in Python, but we need a simple and standard one that respects typings, for example:
from typings import List, NamedTuple
class Address(object):
city:str
postcode:str
class Person(NamedTuple):
name:str
addresses:List[Address]
My ask is extremely simple, I am looking for a standard way to convert to and from JSON, without the need to write the serialisation/deserlialisation code for every class, for example:
json = '{ "name": "John", "addresses": [{ "postcode": "EC2 2FA", "city": "London" }, { "city": "Paris", "postcode": "545887", "extra_attribute": "" }]}'
I need a way to:
p= magic(json, Person) # or something similar
print(type(p)) # should print Person
for a in p.addresses:
print(type(a)) # prints Address
print(a.city) # should print London then Paris
json2 = unmagic(p)
print(json2 == json) # prints true (probably there will be difference in spacing, but just to clarify the idea)
I have worked in programming for 15 years, and have been using Python for a year, and still not sure what is the best way of very simply serialise/deserialise a structure of POCO objects even after extensive research, I feel dumb.
Edit
Options explored so far have one or more of the following limitations:
I generally use the Marshmallow project to handle JSON serialisation, deserialisation, and validation. When combined with marshmallow-dataclass or, when using SQLAlchemy database models, marshmallow-sqlalchemy, you can produce Marshmallow schemas straight from existing object definitions. You work with instances of the model themselves, so dataclass-defined class instances or SQLAlchemy ORM model instances.
Marshmallow schemas also let you define what happens with extra values in the JSON document; you can ignore these, or throw an exception for them, and vary this per model (models can be nested as needed). You can reuse schemas to subsets of the fields too.
Your small sample model, using marshmallow-dataclass
, could be defined as:
import marshmallow
from marshmallow_dataclass import dataclass
from typing import List
class BaseSchema(marshmallow.Schema):
class Meta:
unknown = marshmallow.EXCLUDE
@dataclass(base_schema=BaseSchema)
class Address:
city: str
postcode: str
@dataclass(base_schema=BaseSchema)
class Person:
name: str
addresses: List[Address]
and apart from pip install marshmallow-dataclass
before attempting to run the above, that's it. This example uses an explicit base schema to set the unknown
configuration to EXCLUDE
, which means: ignore extra attributes in the JSON when loading.
To either deserialize from JSON data, or to serialise to JSON, create an instance of the schema; each dataclass
class has a Schema
attribute referencing the corresponding (generated) Marshmallow schema object:
>>> schema = Person.Schema()
>>> json = '{ "name": "John", "addresses": [{ "postcode": "EC2 2FA", "city": "London" }, { "city": "Paris", "postcode": "545887", "extra_attribute": "" }]}'
>>> p = schema.loads(json)
>>> p
Person(name='John', addresses=[Address(city='London', postcode='EC2 2FA'), Address(city='Paris', postcode='545887')])
>>> print(type(p)) # should print Person
<class '__main__.Person'>
>>> for a in p.addresses:
... print(type(a)) # prints Address
... print(a.city) # should print London then Paris
...
<class '__main__.Address'>
London
<class '__main__.Address'>
Paris
>>> schema.dumps(p)
'{"name": "John", "addresses": [{"postcode": "EC2 2FA", "city": "London"}, {"postcode": "545887", "city": "Paris"}]}'
The Schema.loads()
and Schema.dumps()
methods accept and produce JSON strings. You can also work with plain Python dictionaries and lists (the types that would be serialisable to JSON using the standard library json
module), via Schema.load()
and Schema.dump()
.
For more complex setups you may need to configure the exact validation rules for fields, or exclude some fields from serialisation. You do this with the standard dataclasses.field()
function, passing in Marshmallow field options via the metadata
argument. marshmallow-dataclass
can work out what exact Marshmallow field type to use, but you can always override this. And you can use the NewType()
class to define reusable definitions for this; SomeType = NewType("SomeType", python_type, field=MarshmallowField, **field_args)
lets you mark dataclass fields as field_name: SomeType
in your project.
Marshmallow is, at least for me, the Swiss Army Knife project of serialisation and deserialisation, and there are lots of resources that integrate with Marshmallow. E.g. I'm looking at building several RESTFul APIs for a customer at the moment, and I'll definitely be using Flask-Smorest to define the API endpoints and generate OpenAPI documentation at the same time. And all I have to do is create the SQLAlchemy models for this, really.
Here is an example Flask RESTful API based on your Person & Address schema, but as SQLALchemy models, served as RESTful API:
# pip install Flask flask-marshmallow flask-smorest flask-sqlalchemy marshmallow-sqlalchemy
import marshmallow
from flask import Flask
from flask.views import MethodView
from flask_marshmallow import Marshmallow
from flask_smorest import Api, Blueprint, abort
from flask_sqlalchemy import SQLAlchemy
app = Flask(__name__)
app.config['API_TITLE'] = 'ContactBook'
app.config['API_VERSION'] = 'v1'
app.config['OPENAPI_VERSION'] = '3.0.3'
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///:memory:'
api = Api(app)
db = SQLAlchemy(app)
ma = Marshmallow(app)
class Address(db.Model):
id = db.Column(db.Integer, primary_key=True)
city = db.Column(db.String)
postcode = db.Column(db.String)
person_id = db.Column(db.Integer, db.ForeignKey('person.id'), nullable=False)
class Person(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String)
addresses = db.relationship('Address', backref='person', lazy=True)
# create tables in the (in-memory, temporary) database
db.create_all()
class BaseSQLAlchemyAutoSchema(ma.SQLAlchemyAutoSchema):
def update(self, instance, **data):
for fname in self.fields:
if fname not in data:
continue
setattr(instance, fname, data.get(fname))
class AddressSchema(BaseSQLAlchemyAutoSchema):
class Meta:
table = Address.__table__
class PersonSchema(BaseSQLAlchemyAutoSchema):
class Meta:
table = Person.__table__
addresses = ma.List(ma.Nested(AddressSchema(unknown=marshmallow.EXCLUDE)))
class PersonQueryArgsSchema(ma.Schema):
name = ma.String()
city = ma.String()
blp = Blueprint(
"people", "people", url_prefix="/people", description="Operations on people"
)
@blp.route("/")
class People(MethodView):
@blp.arguments(PersonQueryArgsSchema, location="query")
@blp.response(200, PersonSchema(many=True))
def get(self, args):
"""List people"""
query = Person.query
if args.get("name"):
query = query.filter(Person.name == args["name"])
if args.get("city"):
query = query.filter(Person.addresses.any(Address.city == args["city"]))
return query
@blp.arguments(PersonSchema(unknown=marshmallow.EXCLUDE))
@blp.response(201, PersonSchema)
def post(self, new_person):
"""Add a new person"""
addresses = new_person.pop("addresses", ())
person = Person(**new_person)
for address in addresses:
person.addresses.append(Address(**address))
db.session.add(person)
db.session.commit()
return person
@blp.route("/<person_id>")
class PersonById(MethodView):
@blp.response(200, PersonSchema)
def get(self, person_id):
"""Get person by ID"""
return Person.query.get_or_404(person_id)
@blp.arguments(PersonSchema(unknown=marshmallow.EXCLUDE, exclude=('addresses',)))
@blp.response(200, PersonSchema)
def put(self, updated_person_data, person_id):
"""Update existing person"""
person = Person.query.get_or_404(person_id)
PersonSchema().update(person, **updated_person_data)
db.session.commit()
return person
@blp.response(204)
def delete(self, person_id):
"""Delete person"""
db.session.delete(Person.query.get_or_404(person_id))
api.register_blueprint(blp)
Voila, full-featured REST API that lets us list, updated, created and deleted Person
entries.
You can use dataclasses and dacite library for solving this problem. Here's my example:
from dataclasses import dataclass, asdict
from typing import List
from dacite import from_dict
@dataclass
class Address(object):
city: str
postcode: str
@dataclass
class Person():
name: str
addresses: List[Address]
So if you want to serialize the class person you can do:
address1 = Address("London", "EC2 2FA")
address2 = Address("Paris", "545887")
person = Person(name='John', addresses=[address1, address2])
json = asdict(person)
print(json)
Which will print your person information as:
{'name': 'John', 'addresses': [{'city': 'London', 'postcode': 'EC2 2FA'}, {'city': 'Paris', 'postcode': '545887'}]}
Although a native way was preferred, there's no easy way of accomplishing all the requirements in a simple and native way. Assuming that you don't want to drop any requirement, the simplest solution I found is using dacite library. It has only one method, from_dict(class, data), which takes care of nested dataclass creation and ignoring extra arguments in the json, among many other things .
person2 = from_dict(Person, json)
This complies with all your requirements:
json = '{ "name": "John", "addresses": [{ "postcode": "EC2 2FA", "city": "London" }, { "city": "Paris", "postcode": "545887", "extra_attribute": "" }]}'
p = from_dict(Person, json)
print(type(p)) # should print Person
for a in p.addresses:
print(type(a)) # prints Address
print(a.city) # should print London then Paris
json2 = asdict(p)
print(json)
print(json2)
Results in:
<class '__main__.Person'>
<class '__main__.Address'>
London
<class '__main__.Address'>
Paris
{'name': 'John', 'addresses': [
{'postcode': 'EC2 2FA', 'city': 'London'},
{'city': 'Paris', 'postcode': '545887', 'extra_attribute': ''}
]}
{'name': 'John', 'addresses': [
{'city': 'London', 'postcode': 'EC2 2FA'},
{'city': 'Paris', 'postcode': '545887'}
]}
Warning: json will not be equal to json2 in this case, since asdict(p) will generate the dict with the elements in declaration order. Nonetheless, objects created using this json2 will have equal values to the objects created with json.
First:
pip install dacite
Second: create dto.py
import logging
from typing import Optional, List, cast
from dataclasses import dataclass
from dacite import from_dict
logging.basicConfig(
filename='response.log',
level=logging.INFO,
format='%(asctime)s %(levelname)-8s [%(filename)s:%(lineno)d:%(process)s] %(message)s',
datefmt='%Y-%m-%d %H:%M:%S',
)
SENTINEL = cast(None, object())
@dataclass
class Address:
city: Optional[str] = SENTINEL
postcode: Optional[str] = SENTINEL
def asdict(self):
return {k: v for k, v in self.__dict__.items() if v is not SENTINEL}
@dataclass
class Person:
name: Optional[str] = SENTINEL
addresses: Optional[List[Address]] = SENTINEL
def asdict(self):
return {k: v for k, v in self.__dict__.items() if v is not SENTINEL}
if __name__ == '__main__':
SAMPLE = {
"name": "John",
"addresses": [
{
"postcode": "EC2 2FA",
"city": "London"
},
{
"city": "Paris",
"postcode": "545887",
"extra_attribute": ""
}
]
}
try:
targetClass = (
Address
)
INFORMATION = from_dict(
data_class=Person,
data=SAMPLE
)
# TODO: Should be ommited (Just for your questions).
logging.info(
" -- type(p): " + str(type(INFORMATION))
)
# TODO: Should be ommited (Just for your questions).
for a in INFORMATION.addresses:
logging.info(
" -- type(a): " + str(type(a))
)
logging.info(
" -- a.city: " + str(a.city)
)
INFORMATION = INFORMATION.asdict()
for key, value in INFORMATION.items():
if isinstance(value, targetClass):
INFORMATION.update({key: value.asdict()})
if isinstance(value, list) and value and isinstance(value[0], targetClass):
INFORMATION.update({key: [v.asdict() for v in value]})
except Exception as e:
logging.error(
'Error: {}'.format(e)
)
finally:
# TODO: Should be ommited (Just for your questions).
logging.info(
" -- json: " + str(SAMPLE)
)
# TODO: Should be ommited (Just for your questions).
logging.info(
" -- json2: " + str(INFORMATION)
)
# TODO: Should be ommited (Just for your questions).
logging.info(
" -- json2 == json: " + str(INFORMATION == SAMPLE)
)
Third: see response.log
2021-03-11 12:49:08 INFO [dto.py:66:42426] -- type(INFORMATION): <class '__main__.Person'>
2021-03-11 12:49:08 INFO [dto.py:72:42426] -- type(a): <class '__main__.Address'>
2021-03-11 12:49:08 INFO [dto.py:76:42426] -- a.city: London
2021-03-11 12:49:08 INFO [dto.py:72:42426] -- type(a): <class '__main__.Address'>
2021-03-11 12:49:08 INFO [dto.py:76:42426] -- a.city: Paris
2021-03-11 12:49:08 INFO [dto.py:92:42426] -- json: {'name': 'John', 'addresses': [{'postcode': 'EC2 2FA', 'city': 'London'}, {'city': 'Paris', 'postcode': '545887', 'extra_attribute': ''}]}
2021-03-11 12:49:08 INFO [dto.py:96:42426] -- json2: {'name': 'John', 'addresses': [{'city': 'London', 'postcode': 'EC2 2FA'}, {'city': 'Paris', 'postcode': '545887'}]}
2021-03-11 12:49:08 INFO [dto.py:100:42426] -- json2 == json: False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With