Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Incrementally marshal / pickle an object?

I have a large object I'd like to serialize to disk. I'm finding marshal works quite well and is nice and fast.

Right now I'm creating my large object then calling marshal.dump . I'd like to avoid holding the large object in memory if possible - I'd like to dump it incrementally as I build it. Is that possible?

The object is fairly simple, a dictionary of arrays.

like image 296
Parand Avatar asked Mar 01 '23 22:03

Parand


2 Answers

The bsddb module's 'hashopen' and 'btopen' functions provide a persistent dictionary-like interface. Perhaps you could use one of these, instead of a regular dictionary, to incrementally serialize the arrays to disk?

import bsddb
import marshal

db = bsddb.hashopen('file.db')
db['array1'] = marshal.dumps(array1)
db['array2'] = marshal.dumps(array2)
...
db.close()

To retrieve the arrays:

db = bsddb.hashopen('file.db')
array1 = marshal.loads(db['array1'])
...
like image 55
elo80ka Avatar answered Mar 07 '23 13:03

elo80ka


It all your object has to do is be a dictionary of lists, then you may be able to use the shelve module. It presents a dictionary-like interface where the keys and values are stored in a database file instead of in memory. One limitation which may or may not affect you is that keys in Shelf objects must be strings. Value storage will be more efficient if you specify protocol=-1 when creating the Shelf object to have it use a more efficient binary representation.

like image 30
Theran Avatar answered Mar 07 '23 15:03

Theran