Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to save complex Python data structures across program sessions (pickle, json, xml, database, other)

Looking for advice on the best technique for saving complex Python data structures across program sessions.

Here's a list of techniques I've come up with so far:

  • pickle/cpickle
  • json
  • jsonpickle
  • xml
  • database (like SQLite)

Pickle is the easiest and fastest technique, but my understanding is that there is no guarantee that pickle output will work across various versions of Python 2.x/3.x or across 32 and 64 bit implementations of Python.

Json only works for simple data structures. Jsonpickle seems to correct this AND seems to be written to work across different versions of Python.

Serializing to XML or to a database is possible, but represents extra effort since we would have to do the serialization ourselves manually.

Thank you, Malcolm

like image 308
Malcolm Avatar asked Jan 05 '10 01:01

Malcolm


4 Answers

You have a misconception about pickles: they are guaranteed to work across Python versions. You simply have to choose a protocol version that is supported by all the Python versions you care about.

The technique you left out is marshal, which is not guaranteed to work across Python versions (and btw, is how .pyc files are written).

like image 136
Ned Batchelder Avatar answered Nov 10 '22 13:11

Ned Batchelder


You left out the marshal and shelve modules.

Also this python docs page covers persistence

like image 4
SpliFF Avatar answered Nov 10 '22 12:11

SpliFF


Have you looked at PySyck or pyYAML?

like image 2
rnicholson Avatar answered Nov 10 '22 13:11

rnicholson


What are your criteria for "best" ?

  • pickle can do most Python structures, deeply nested ones too
  • sqlite dbs can be easily queried (if you know sql :)
  • speed / memory ? trust no benchmarks that you haven't faked yourself.

(Fine print:
cPickle.dump(protocol=-1) compresses, in one case 15M pickle / 60M sqlite, but can break.
Strings that occur many times, e.g. country names, may take more memory than you expect; see the builtin intern().
)

like image 2
denis Avatar answered Nov 10 '22 12:11

denis