Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Saving dictionaries to file (numpy and Python 2/3 friendly)

I want to do hierarchical key-value storage in Python, which basically boils down to storing dictionaries to files. By that I mean any type of dictionary structure, that may contain other dictionaries, numpy arrays, serializable Python objects, and so forth. Not only that, I want it to store numpy arrays space-optimized and play nice between Python 2 and 3.

Below are methods I know are out there. My question is what is missing from this list and is there an alternative that dodges all my deal-breakers?

  • Python's pickle module (deal-breaker: inflates the size of numpy arrays a lot)
  • Numpy's save/savez/load (deal-breaker: Incompatible format across Python 2/3)
  • PyTables replacement for numpy.savez (deal-breaker: only handles numpy arrays)
  • Using PyTables manually (deal-breaker: I want this for constantly changing research code, so it's really convenient to be able to dump dictionaries to files by calling a single function)

The PyTables replacement of numpy.savez is promising, since I like the idea of using hdf5 and it compresses the numpy arrays really efficiently, which is a big plus. However, it does not take any type of dictionary structure.

Lately, what I've been doing is to use something similar to the PyTables replacement, but enhancing it to be able to store any type of entries. This actually works pretty well, but I find myself storing primitive data types in length-1 CArrays, which is a bit awkward (and ambiguous to actual length-1 arrays), even though I set chunksize to 1 so it doesn't take up that much space.

Is there something like that already out there?

Thanks!

like image 264
Gustav Larsson Avatar asked Aug 06 '13 02:08

Gustav Larsson


People also ask

How do I save a dictionary of Numpy arrays?

numpy. savez() will save a dictionary of arrays out to a . zip file. name corresponding to the key.


1 Answers

After asking this two years ago, I starting coding my own HDF5-based replacement of pickle/np.save. Ever since, it has matured into a stable package, so I thought I would finally answer and accept my own question because it is by design exactly what I was looking for:

  • https://github.com/uchicago-cs/deepdish
like image 143
Gustav Larsson Avatar answered Oct 19 '22 08:10

Gustav Larsson