How to save big (not huge) dictonaries in Python?

Question

My dictionary will consist of several thousand keys which each key having a 1000x1000 numpy array as value. I don't need the file to be human readable. Small size and fast loading times are more important.

First I tried savemat, but I ran into problems. Pickle resulted in a huge file. I assume the same for csv. I've read posts recommending using json (readable text probably huge) or db (assumingly complicated). What would you recommend for my case?

jterrace · Accepted Answer

If you have a dictionary where the keys are strings and the values are arrays, like this:

>>> import numpy
>>> arrs = {'a': numpy.array([1,2]),
            'b': numpy.array([3,4]),
            'c': numpy.array([5,6])}

You can use numpy.savez to save them, by key, to a compressed file:

>>> numpy.savez('file.npz', **arrs)

To load it back:

>>> npzfile = numpy.load('file.npz')
>>> npzfile
<numpy.lib.npyio.NpzFile object at 0x1fa7610>
>>> npzfile['a']
array([1, 2])
>>> npzfile['b']
array([3, 4])
>>> npzfile['c']
array([5, 6])

Greg Hewgill · Answer

The filesystem itself is often an underappreciated data structure. You could have a dictionary that is a map from your keys to filenames, and then each file has the 1000x1000 array in it. Pickling the dictionary would be quick and easy, and then the data files can just contain raw data (which numpy can easily load).

How to save big (not huge) dictonaries in Python?

Tags:

python

dictionary

file-io

numpy

scipy

Framester

2 Answers

jterrace

Greg Hewgill

Recent Activity

Donate For Us

How to save big (not huge) dictonaries in Python?

Tags:

python

dictionary

file-io

numpy

scipy

Framester

2 Answers

jterrace

Greg Hewgill

Related questions

Recent Activity

Donate For Us