I have a project where I would like to store a large structure (nested objects) in a relational db (Postgres). It's part of a larger structure and I don't really care about the serialization format - I'm happy for it to be a blob in a column - I'd just like to be able to persist and restore it fairly quickly.
For my purposes the SQLAlchemy PickleType mostly does the job. The issue I have is that I'd like the dirty checks to work (something that the Mutable Types are used for). I'd like them to work not only if I change info in the paths but also in the bounds (which sit another level down).
class Group(Base):
__tablename__ = 'group'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
paths = Column(types.PickleType)
class Path(object):
def __init__(self, style, bounds):
self.style = style
self.bounds = bounds
class Bound(object):
def __init__(self, l, t, r, b):
self.l = l
self.t = t
self.r = r
self.b = b
# this is all fine
g = Group(name='g1', paths=[Path('blah', Bound(1,1,2,3)),
Path('other_style', Bound(1,1,2,3)),])
session.add(g)
session.commit()
# so is this
g.name = 'g2'
assert g in session.dirty
session.commit()
# but this won't work without some sort of tracking on the deeper objects
g.paths[0].style = 'something else'
assert g in session.dirty # nope
I've played around with the Mutable types trying to get it working but haven't had any luck. Elsewhere I do use the mutable types for a json column which is fine - in a way that seems simpler though because with these classes you need to track changes to objects within the objects too.
Any thoughts appreciated.
create_all() function to create the tables that are associated with your models. In this case you only have one model, which means that the function call will only create one table in your database: from app import db, Student.
PickleType. Holds Python objects, which are serialized using pickle.
From SQLAlchemy docs: nullable – If set to the default of True, indicates the column will be rendered as allowing NULL, else it's rendered as NOT NULL. This parameter is only used when issuing CREATE TABLE statements.
First of all, as you realized, you have to track changes to objects within the objects, since there's no way for SQLAlchemy to know an inner object changed. So, we'll get that out of the way with a base mutable object we can use for both:
class MutableObject(Mutable, object):
@classmethod
def coerce(cls, key, value):
return value
def __getstate__(self):
d = self.__dict__.copy()
d.pop('_parents', None)
return d
def __setstate__(self, state):
self.__dict__ = state
def __setattr__(self, name, value):
object.__setattr__(self, name, value)
self.changed()
class Path(MutableObject):
def __init__(self, style, bounds):
super(MutableObject, self).__init__()
self.style = style
self.bounds = bounds
class Bound(MutableObject):
def __init__(self, l, t, r, b):
super(MutableObject, self).__init__()
self.l = l
self.t = t
self.r = r
self.b = b
And we also need to track the changes on the paths list, so, we have to make that a mutable object too. However, the Mutable tracks changes in children by propagating them to the parents when the changed() method is called, and the current implementation in SQLAlchemy seems to only assign a parent to someone assigned as an attribute, not as an item of a sequence, like a dictionary or a list. This is where things get complicated.
I think the list items should have the list itself as a parent, but that doesn't work for two reasons: first, the _parents weakdict can't take a list for a key, and second, the changed() signal doesn't propagate all the way to the top, so, we will just be marking the list itself as changed. I'm not 100% sure how correct this is, but the way to go seems to be assigning the list's parent to every item, so the group object gets the flag_modified call when an item is changed. This should do it.
class MutableList(Mutable, list):
@classmethod
def coerce(cls, key, value):
if not isinstance(value, MutableList):
if isinstance(value, list):
return MutableList(value)
value = Mutable.coerce(key, value)
return value
def __setitem__(self, key, value):
old_value = list.__getitem__(self, key)
for obj, key in self._parents.items():
old_value._parents.pop(obj, None)
list.__setitem__(self, key, value)
for obj, key in self._parents.items():
value._parents[obj] = key
self.changed()
def __getstate__(self):
return list(self)
def __setstate__(self, state):
self[:] = state
However, there's one last issue here. The parents get assigned by a call listening on the 'load' event, so at the time of initialization, the _parents dict is empty, and the children get nothing assigned. I think maybe there's some cleaner way you can do that by listening on the load event too, but I figured the dirty way to do it would be to reassign the parents when the items are retrieve, so, add this:
def __getitem__(self, key):
value = list.__getitem__(self, key)
for obj, key in self._parents.items():
value._parents[obj] = key
return value
Finally, we have to use that MutableList on Group.paths:
class Group(BaseModel):
__tablename__ = 'group'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String, nullable=False)
paths = db.Column(MutableList.as_mutable(types.PickleType))
And with all this your test code should work:
g = Group(name='g1', paths=[Path('blah', Bound(1,1,2,3)),
Path('other_style', Bound(1,1,2,3)),])
session.add(g)
db.session.commit()
g.name = 'g2'
assert g in db.session.dirty
db.session.commit()
g.paths[0].style = 'something else'
assert g in db.session.dirty
Frankly, I'm not sure how safe it is to get this on production, and if you don't need a flexible schema, you'd be probably better using a table and relationships for Path and Bound.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With