Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alternatives to pickle's `persistent_id`?

I have been using Python's pickle module for implementing a thin file-based persistence layer. The persistence layer (part of a larger library) relies heavily on pickle's persistent_id feature to save objects of specified classes as separate files.

The only issue with this approach is that pickle files are not human editable, and I'd much rather have objects saved in a format that is human readable and editable with a text editor (e.g., YAML or JSON).

Do you know of any library that uses a human-editable format and offers features similar to pickle's persistent_id? Alternatively, do you have suggestions for implementing them on top of a YAML- or JSON-based serialization library, without rewriting a large subset of pickle?

like image 671
Riccardo Murri Avatar asked Oct 10 '22 15:10

Riccardo Murri


1 Answers

I haven't tried this yet myself, but I think you should be able to do this elegantly with PyYAML using what they call "representers" and "resolvers".

EDIT

After an extensive exchange of comments with the poster, here is a method to achieve the required behavior with PyYAML.

Important Note: If a Persistable instance has another such instance as an attribute, or contained somehow inside one of its attributes, then the contained Persistable instance will not be saved to yet another separate file, rather it will be saved inline in the same file as the parent Persistable instance. To the best of my understanding, this limitation also existed in the OP's pickle-based system, and may be acceptable for his/her use cases. I haven't found an elegant solution for this which doesn't involve hacking yaml.representer.BaseRepresenter.

import yaml
from functools import partial

class Persistable(object):
    # simulate a unique id
    _unique = 0

    def __init__(self, *args, **kw):
        Persistable._unique += 1
        self.persistent_id = ("%s.%d" %
                              (self.__class__.__name__, Persistable._unique))

def persistable_representer(dumper, data):
    id = data.persistent_id
    print "Writing to file: %s" % id
    outfile = open(id, 'w')
    outfile.write(yaml.dump(data))
    outfile.close()
    return dumper.represent_scalar(u'!xref', u'%s' % id)

class PersistingDumper(yaml.Dumper):
    pass

PersistingDumper.add_representer(Persistable, persistable_representer)
my_yaml_dump = partial(yaml.dump, Dumper=PersistingDumper)

def persistable_constructor(loader, node):
    xref = loader.construct_scalar(node)
    print "Reading from file: %s" % id
    infile = open(xref, 'r')
    value = yaml.load(infile.read())
    infile.close()
    return value

yaml.add_constructor(u'!xref', persistable_constructor)


# example use, also serves as a test
class Foo(Persistable):
    def __init__(self):
        self.one = 1
        Persistable.__init__(self)

class Bar(Persistable):
    def __init__(self, foo):
        self.foo = foo
        Persistable.__init__(self)

foo = Foo()
bar = Bar(foo)
print "=== foo ==="
dumped_foo = my_yaml_dump(foo)
print dumped_foo
print yaml.load(dumped_foo)
print yaml.load(dumped_foo).one

print "=== bar ==="
dumped_bar = my_yaml_dump(bar)
print dumped_bar
print yaml.load(dumped_bar)
print yaml.load(dumped_bar).foo
print yaml.load(dumped_bar).foo.one

baz = Bar(Persistable())
print "=== baz ==="
dumped_baz = my_yaml_dump(baz)
print dumped_baz
print yaml.load(dumped_baz)

From now on use my_yaml_dump instead of yaml.dump when you want to save instances of the Persistable class to separate files. But don't use it inside persistable_representer and persistable_constructor! No special loading function is necessary, just use yaml.load.

Phew, that took some work... I hope this helps!

like image 77
taleinat Avatar answered Oct 12 '22 11:10

taleinat