I am using YAML and SQLAlchemy. I defined my object, and I am able to use YAML to print that just fine. However, when I try to use YAML on the object returned from a SQLAlchemy query, it is failing with the error can't pickle int objects
. I printed out the instance returned from SQLAlchemy, and it is showing the correct type. I'll let the code do the talking:
class HashPointer(Base):
__tablename__ = 'hash_pointers'
id = Column(Integer, primary_key=True)
hash_code = Column(VARBINARY(64), unique=True)
file_pointer = Column(Text)
def __init__(self, hash_code, file_pointer):
self.hash_code = hash_code
self.file_pointer = file_pointer
def __repr__(self):
return "<HashPointer('%s', '%s')>" % (self.hash_code, self.file_pointer)
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
Engine = create_engine("mysql://user:pass@localhost/db", echo=True)
Session = sessionmaker(bind=Engine)
session = Session()
fhash = HashPointer(0x661623708235, "c:\\test\\001.txt")
# PRINTS FINE
print(yaml.dump(fhash))
for instance in session.query(HashPointer).all():
# PRINTS FINE AS __repr__
print instance
# THROWS ERROR, 'CAN'T PICKLE INT OBJECTS'
print(yaml.dump(instance))
Try adding the following to your class:
def __reduce__(self):
'Return state information for pickling'
return self.__class__, (int(self.hash_code), str(self.file_pointer))
It turns out that the default reduce_ex method (im pretty sure this is the one in object(), but it doesn't have to be.) that comes down the line when you have sqlalchemy active, adds a _sa_instance_state
member to the 'state' returned in the reduce_ex API which PyYAML uses to perform serialization.
When serializing an object coming from a SqlAlchemy query, this is essentially a hidden part of the object's metadata, which is accessible to further operations.
It is this object in which the PyYAML serializer is failing. You can verify this by running your serialization in PDB, and seeing two calls to represent_object in your call stack, even for relatively simple SQLAlchemy query object results.
This query instance link is used, as I understand it, to power methods with let you poke back at the query which generates a given object from within the same python interpreter's lifetime.
If you care about that functionality (stuff like session.new & session.dirty), you will need to implement support for that in PyYAML's serializer.
If you don't care, and just want your declared members, you can use a base class which 'hides' that linkage from calls to reduce* -- note that this will also break the SQLAlchemy serializer extension as well though, so consider your plans carefully.
An example of a base class to implement that change is:
DeclBase = declarative_base()
class Base(DeclBase):
__abstract__ = True
def __reduce_ex__(self, proto):
ret = super(Base, self).__reduce_ex__(proto)
ret = ( ret[0], ret[1], dict(ret[2]) ) + ret[3:]
ret[2].pop('_sa_instance_state', None) # remove bad yamly from reduce state
return ret
This will then allow you to roundtrip your objects in/out of yaml, though a round trip will disassociate them from any pending transactions or queries. This could also have interactions if you are using lazy-loaded members, for one example. Make sure you are serializing everything you expect.
NOTE/EDIT: I chose to use reduce_ex here, to be compatible with possible other base classes or mixins. According to https://docs.python.org/2/library/pickle.html#object.reduce_ex, this will produce the correct behavior for any base classes, also detecting if only reduce() was declared.
Redux... reduce will return the actual dict of the instance object -- we don't want to delete in from there, so for __reduce*, we must actually shallow copy that dict.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With