What are the best practices for combining marshmallow schema definitions and OO in Python? [closed]

Tags:

Assume a simple schema defined in marshmallow

class AddressSchema(Schema):
    street=fields.String(required=True)
    city=fields.String(required=True)
    country=fields.String(default='USA')

class PersonSchema(Schema):
    name=fields.String(required=True)
    address=fields.Nested(AddressSchema())

The use case here is applications working with in-memory objects, and serialization/deserialization to JSON, i.e. no SQL database.

Using the standard json library I can parse JSON objects that conform to this schema, and access objects in a manner such as person1['address']['city'], but the use of typo-prone strings in verbose syntax is somewhat unsatisfactory.

Hand-crafted OO model

I could define a parallel OO model, and annotate my schema with @post_load decorators, for example:

class Address(object):
    def __init__(self, street, city, country='USA'):
        self.street=street
        self.city=city
        self.country=country

class Person(object):
    def __init__(self, street, city=None):
        self.street=street
        self.city=city

But the repetition is not very nice (and I haven't even included descriptions in the schema).

No OO model

Arguably the explicit OO model doesn't buy much - it's basic data accessors, no behavior. I could get some syntactic sugar using jsobject, so that I could write for example person1.address.city. But this doesn't seem quite right either. As a developer I have no explicit python class API to consult to determine what fields to use, I can reference the marshmallow schema but this feels very indirect.

Code Generation

It would be fairly easy to generate the OO code above from the marshmallow schema definitions. I'm surprised there seems to be no such library. Perhaps code generation is considered very unpythonic? It would of course only be suitable only for data-access style class definitions; adding non-generic behavior would be strictly a no-no.

For users of the code, they would not need to know a codegen approach was used - everything would be there with an explicit API, with docs visible alongside the rest of the code in readthedocs etc.

Dynamic Classes

The other approach would be dynamic classes derived from the marshmallow definitions. Again, as far as I can tell there is no such library (although the range of dynamic class generation approaches in python is impressive, I may have missed some). Arguably this would not buy you that much over the jsobjects approach, but there may be some advantages - it would be possible to interweave this with some explicit code with defined behaviors. The downside of a dynamic approach is that explicit is favored over implicit in the Python world.

What's most pythonic?

The lack of libraries here means I'm either not finding something, or am not looking at this in a suitably pythonic way. I'm happy to contribute something to pypi but before adding yet-another meta-OO library I wanted to be sure I had done due diligence here.

286

asked Aug 21 '17 21:08

Chris Mungall

1 Answers

Your question is quite vague, and so will be my answer, and quite subjective I hope that's ok. I am just some dude who spent the day reading serialization options in python.

I think that Marshmallow is fundamentally unpythonic, and there isn't a great way to use it, I don't intend to use it. I'll give what for me is two definitive examples.

You have a class that has as a field a mixed-type list of other objects. This is python so you're allowed to do that. In marshmallow you can't deal with this natively or neatly. There's a very natural solution, which is to put down/up a list of the classes by using their registered serializers. But in Marshmallow you'd have to write your own code and change the serializer of every possible class it could to ensure it's registered in something you pass to a nested func. It doesn't register serializers for classes, you'd have to add that. Issue describing.
You want to serialize a mixed (literal) dict of strings to unknown classes. This is pretty much the same as above, you'd have to implement it yourself. Plus you'd have to write a serializer/de yourself for every primitive you wanted to do this for. They wrote code for primitives as fields but not schemas, which to me is not batteries included or one obvious way.

General serializing libraries are kind of a deep rabbit hole because what you're really talking about requires elements of a type system, a parser and a graph traversal algorithm all in one go. Marshmallow doesn't do recursive parsing by default, so it fails point (2). On point (1), (kind of because of 2), it either requires extensive hacking or requires you to accept a java-like type system (everything is of known, enumerated type).

You asked about general serializing libraries, I found the library camel interesting and the blog post around it. For pickle, there's a powerful extension called dill and a mixin that handles versioning

answered Oct 18 '22 02:10

zimablue

Related questions
                            
                                Building numpy with ATLAS/LAPACK support
                            
                                SWIG Python bindings to native code not working with OpenCV 2.1
                            
                                QWebView undersampled SVG rendering
                            
                                Cython sum v/s mean memory jump
                            
                                Flask + RabbitMQ + SocketIO - forwarding messages
                            
                                What is the fastest way to compare patches of an array?
                            
                                Why is regex search in substring "not completely equivalent to slicing the string" in Python?
                            
                                Why is pip, inside a virtualenv, writing to /usr/lib?
                            
                                Numpy octuple precision floats and 128 bit ints. Why and how?
                            
                                Pandas GroupBy memory deallocation
                            
                                Bug in SQLAlchemy Rollback after DB Exception?
                            
                                Storing tensorflow models in memory
                            
                                Python Tuple in Java XMLRPC
                            
                                Python packaging: Generate a python file at installation time, have this work with tox

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With