My dictionary will consist of several thousand keys which each key having a 1000x1000 numpy array as value. I don't need the file to be human readable. Small size and fast loading times are more important.
First I tried savemat, but I ran into problems. Pickle resulted in a huge file. I assume the same for csv. I've read posts recommending using json (readable text probably huge) or db (assumingly complicated). What would you recommend for my case?
If you have a dictionary where the keys are strings and the values are arrays, like this:
>>> import numpy
>>> arrs = {'a': numpy.array([1,2]),
'b': numpy.array([3,4]),
'c': numpy.array([5,6])}
You can use numpy.savez to save them, by key, to a compressed file:
>>> numpy.savez('file.npz', **arrs)
To load it back:
>>> npzfile = numpy.load('file.npz')
>>> npzfile
<numpy.lib.npyio.NpzFile object at 0x1fa7610>
>>> npzfile['a']
array([1, 2])
>>> npzfile['b']
array([3, 4])
>>> npzfile['c']
array([5, 6])
The filesystem itself is often an underappreciated data structure. You could have a dictionary that is a map from your keys to filenames, and then each file has the 1000x1000 array in it. Pickling the dictionary would be quick and easy, and then the data files can just contain raw data (which numpy can easily load).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With