Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the best practices for combining marshmallow schema definitions and OO in Python? [closed]

Assume a simple schema defined in marshmallow

class AddressSchema(Schema):
    street=fields.String(required=True)
    city=fields.String(required=True)
    country=fields.String(default='USA')

class PersonSchema(Schema):
    name=fields.String(required=True)
    address=fields.Nested(AddressSchema())

The use case here is applications working with in-memory objects, and serialization/deserialization to JSON, i.e. no SQL database.

Using the standard json library I can parse JSON objects that conform to this schema, and access objects in a manner such as person1['address']['city'], but the use of typo-prone strings in verbose syntax is somewhat unsatisfactory.

Hand-crafted OO model

I could define a parallel OO model, and annotate my schema with @post_load decorators, for example:

class Address(object):
    def __init__(self, street, city, country='USA'):
        self.street=street
        self.city=city
        self.country=country

class Person(object):
    def __init__(self, street, city=None):
        self.street=street
        self.city=city

But the repetition is not very nice (and I haven't even included descriptions in the schema).

No OO model

Arguably the explicit OO model doesn't buy much - it's basic data accessors, no behavior. I could get some syntactic sugar using jsobject, so that I could write for example person1.address.city. But this doesn't seem quite right either. As a developer I have no explicit python class API to consult to determine what fields to use, I can reference the marshmallow schema but this feels very indirect.

Code Generation

It would be fairly easy to generate the OO code above from the marshmallow schema definitions. I'm surprised there seems to be no such library. Perhaps code generation is considered very unpythonic? It would of course only be suitable only for data-access style class definitions; adding non-generic behavior would be strictly a no-no.

For users of the code, they would not need to know a codegen approach was used - everything would be there with an explicit API, with docs visible alongside the rest of the code in readthedocs etc.

Dynamic Classes

The other approach would be dynamic classes derived from the marshmallow definitions. Again, as far as I can tell there is no such library (although the range of dynamic class generation approaches in python is impressive, I may have missed some). Arguably this would not buy you that much over the jsobjects approach, but there may be some advantages - it would be possible to interweave this with some explicit code with defined behaviors. The downside of a dynamic approach is that explicit is favored over implicit in the Python world.

What's most pythonic?

The lack of libraries here means I'm either not finding something, or am not looking at this in a suitably pythonic way. I'm happy to contribute something to pypi but before adding yet-another meta-OO library I wanted to be sure I had done due diligence here.

like image 286
Chris Mungall Avatar asked Aug 21 '17 21:08

Chris Mungall


People also ask

What are marshmallow schemas?

In short, marshmallow schemas can be used to: Validate input data. Deserialize input data to app-level objects. Serialize app-level objects to primitive Python types. The serialized objects can then be rendered to standard formats such as JSON for use in an HTTP API.

What is the use of marshmallow in Python?

Marshmallow is a Python library that converts complex data types to and from Python data types. It is a powerful tool for both validating and converting data.

What is Marshmallow dump?

The main component of Marshmallow is a Schema. A schema defines the rules that guides deserialization, called load, and serialization, called dump. It allows us to define the fields that will be loaded or dumped, add requirements on the fields, like validation or required.


1 Answers

Your question is quite vague, and so will be my answer, and quite subjective I hope that's ok. I am just some dude who spent the day reading serialization options in python.

I think that Marshmallow is fundamentally unpythonic, and there isn't a great way to use it, I don't intend to use it. I'll give what for me is two definitive examples.

  1. You have a class that has as a field a mixed-type list of other objects. This is python so you're allowed to do that. In marshmallow you can't deal with this natively or neatly. There's a very natural solution, which is to put down/up a list of the classes by using their registered serializers. But in Marshmallow you'd have to write your own code and change the serializer of every possible class it could to ensure it's registered in something you pass to a nested func. It doesn't register serializers for classes, you'd have to add that. Issue describing.
  2. You want to serialize a mixed (literal) dict of strings to unknown classes. This is pretty much the same as above, you'd have to implement it yourself. Plus you'd have to write a serializer/de yourself for every primitive you wanted to do this for. They wrote code for primitives as fields but not schemas, which to me is not batteries included or one obvious way.

General serializing libraries are kind of a deep rabbit hole because what you're really talking about requires elements of a type system, a parser and a graph traversal algorithm all in one go. Marshmallow doesn't do recursive parsing by default, so it fails point (2). On point (1), (kind of because of 2), it either requires extensive hacking or requires you to accept a java-like type system (everything is of known, enumerated type).

You asked about general serializing libraries, I found the library camel interesting and the blog post around it. For pickle, there's a powerful extension called dill and a mixin that handles versioning

like image 71
zimablue Avatar answered Oct 18 '22 02:10

zimablue