Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python object persistence

I'm seeking advice about methods of implementing object persistence in Python. To be more precise, I wish to be able to link a Python object to a file in such a way that any Python process that opens a representation of that file shares the same information, any process can change its object and the changes will propagate to the other processes, and even if all processes "storing" the object are closed, the file will remain and can be re-opened by another process.

I found three main candidates for this in my distribution of Python - anydbm, pickle, and shelve (dbm appeared to be perfect, but it is Unix-only, and I am on Windows). However, they all have flaws:

  • anydbm can only handle a dictionary of string values (I'm seeking to store a list of dictionaries, all of which have string keys and string values, though ideally I would seek a module with no type restrictions)
  • shelve requires that a file be re-opened before changes propagate - for instance, if two processes A and B load the same file (containing a shelved empty list), and A adds an item to the list and calls sync(), B will still see the list as being empty until it reloads the file.
  • pickle (the module I am currently using for my test implementation) has the same "reload requirement" as shelve, and also does not overwrite previous data - if process A dumps fifteen empty strings onto a file, and then the string 'hello', process B will have to load the file sixteen times in order to get the 'hello' string. I am currently dealing with this problem by preceding any write operation with repeated reads until end of file ("wiping the slate clean before writing on it"), and by making every read operation repeated until end of file, but I feel there must be a better way.

My ideal module would behave as follows (with "A>>>" representing code executed by process A, and "B>>>" code executed by process B):

A>>> import imaginary_perfect_module as mod
B>>> import imaginary_perfect_module as mod
A>>> d = mod.load('a_file') 
B>>> d = mod.load('a_file')
A>>> d
{}
B>>> d
{}
A>>> d[1] = 'this string is one'
A>>> d['ones'] = 1   #anydbm would sulk here
A>>> d['ones'] = 11 
A>>> d['a dict'] = {'this dictionary' : 'is arbitrary', 42 : 'the answer'}
B>>> d['ones']   #shelve would raise a KeyError here, unless A had called d.sync() and B had reloaded d
11    #pickle (with different syntax) would have returned 1 here, and then 11 on next call
(etc. for B)

I could achieve this behaviour by creating my own module that uses pickle, and editing the dump and load behaviour so that they use the repeated reads I mentioned above - but I find it hard to believe that this problem has never occurred to, and been fixed by, more talented programmers before. Moreover, these repeated reads seem inefficient to me (though I must admit that my knowledge of operation complexity is limited, and it's possible that these repeated reads are going on "behind the scenes" in otherwise apparently smoother modules like shelve). Therefore, I conclude that I must be missing some code module that would solve the problem for me. I'd be grateful if anyone could point me in the right direction, or give advice about implementation.

like image 579
scubbo Avatar asked May 31 '12 09:05

scubbo


People also ask

How does Python store persistent data?

The standard library includes a variety of modules for persisting data. The most common pattern for storing data from Python objects for reuse is to serialize them with pickle and then either write them directly to a file or store them using one of the many key-value pair database formats available with the dbm API.

What is a shelf in Python?

A “shelf” is a persistent, dictionary-like object. The difference with “dbm” databases is that the values (not the keys!) in a shelf can be essentially arbitrary Python objects — anything that the pickle module can handle.

What do you mean by persistence of object explain by giving example?

example breafly. 0. A persistent object is an object that has been assigned a storage location in a federated database. When you commit the transaction in which you create a persistent object, that object's data is saved in the database; the object can then be accessed by other processes.


2 Answers

Use the ZODB (the Zope Object Database) instead. Backed with ZEO it fulfills your requirements:

  • Transparent persistence for Python objects

    ZODB uses pickles underneath so anything that is pickle-able can be stored in a ZODB object store.

  • Full ACID-compatible transaction support (including savepoints)

    This means changes from one process propagate to all the other processes when they are good and ready, and each process has a consistent view on the data throughout a transaction.

ZODB has been around for over a decade now, so you are right in surmising this problem has already been solved before. :-)

The ZODB let's you plug in storages; the most common format is the FileStorage, which stores everything in one Data.fs with an optional blob storage for large objects.

Some ZODB storages are wrappers around others to add functionality; DemoStorage for example keeps changes in memory to facilitate unit testing and demonstration setups (restart and you have clean slate again). BeforeStorage gives you a window in time, only returning data from transactions before a given point in time. The latter has been instrumental in recovering lost data for me.

ZEO is such a plugin that introduces a client-server architecture. Using ZEO lets you access a given storage from multiple processes at a time; you won't need this layer if all you need is multi-threaded access from one process only.

The same could be achieved with RelStorage, which stores ZODB data in a relational database such as PostgreSQL, MySQL or Oracle.

like image 67
Martijn Pieters Avatar answered Oct 02 '22 18:10

Martijn Pieters


For beginners, You can port your shelve databases to ZODB databases like this:

#!/usr/bin/env python
import shelve
import ZODB, ZODB.FileStorage
import transaction
from optparse import OptionParser
import os
import sys
import re

reload(sys)
sys.setdefaultencoding("utf-8")

parser = OptionParser()

parser.add_option("-o", "--output", dest = "out_file", default = False, help ="original shelve database filename")
parser.add_option("-i", "--input", dest = "in_file", default = False, help ="new zodb database filename")

parser.set_defaults()
options, args = parser.parse_args()

if options.in_file == False or options.out_file == False :
    print "Need input and output database filenames"
    exit(1)

db = shelve.open(options.in_file, writeback=True)
zstorage = ZODB.FileStorage.FileStorage(options.out_file)
zdb = ZODB.DB(zstorage)
zconnection = zdb.open()
newdb = zconnection.root()

for key, value in db.iteritems() :
    print "Copying key: " + str(key)
    newdb[key] = value

transaction.commit()
like image 38
Michael Galaxy Avatar answered Oct 02 '22 17:10

Michael Galaxy