Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save big (not huge) dictonaries in Python?

My dictionary will consist of several thousand keys which each key having a 1000x1000 numpy array as value. I don't need the file to be human readable. Small size and fast loading times are more important.

First I tried savemat, but I ran into problems. Pickle resulted in a huge file. I assume the same for csv. I've read posts recommending using json (readable text probably huge) or db (assumingly complicated). What would you recommend for my case?

like image 903
Framester Avatar asked Dec 21 '22 02:12

Framester


2 Answers

If you have a dictionary where the keys are strings and the values are arrays, like this:

>>> import numpy
>>> arrs = {'a': numpy.array([1,2]),
            'b': numpy.array([3,4]),
            'c': numpy.array([5,6])}

You can use numpy.savez to save them, by key, to a compressed file:

>>> numpy.savez('file.npz', **arrs)

To load it back:

>>> npzfile = numpy.load('file.npz')
>>> npzfile
<numpy.lib.npyio.NpzFile object at 0x1fa7610>
>>> npzfile['a']
array([1, 2])
>>> npzfile['b']
array([3, 4])
>>> npzfile['c']
array([5, 6])
like image 117
jterrace Avatar answered Dec 26 '22 12:12

jterrace


The filesystem itself is often an underappreciated data structure. You could have a dictionary that is a map from your keys to filenames, and then each file has the 1000x1000 array in it. Pickling the dictionary would be quick and easy, and then the data files can just contain raw data (which numpy can easily load).

like image 29
Greg Hewgill Avatar answered Dec 26 '22 12:12

Greg Hewgill