I have a large data set (millions of rows) in memory, in the form of numpy arrays and dictionaries.
Once this data is constructed I want to store them into files; so, later I can load these files into memory quickly, without reconstructing this data from the scratch once again.
np.save and np.load functions does the job smoothly for numpy arrays.
But I am facing problems with dict objects.
See below sample. d2 is the dictionary which was loaded from the file. See #out[28] it has been loaded into d2 as a numpy array, not as a dict. So further dict operations such as get are not working.
Is there a way to load the data from the file as dict (instead of numpy array) ?
In [25]: d1={'key1':[5,10], 'key2':[50,100]} In [26]: np.save("d1.npy", d1) In [27]: d2=np.load("d1.npy") In [28]: d2 Out[28]: array({'key2': [50, 100], 'key1': [5, 10]}, dtype=object) In [30]: d1.get('key1') #original dict before saving into file Out[30]: [5, 10] In [31]: d2.get('key2') #dictionary loaded from the file --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-31-23e02e45bf22> in <module>() ----> 1 d2.get('key2') AttributeError: 'numpy.ndarray' object has no attribute 'get'
It's a structured array. Use d2.item()
to retrieve the actual dict object first:
import numpy as np d1={'key1':[5,10], 'key2':[50,100]} np.save("d1.npy", d1) d2=np.load("d1.npy") print d1.get('key1') print d2.item().get('key2')
result:
[5, 10] [50, 100]
pickle module can be used. Example code:
from six.moves import cPickle as pickle #for performance from __future__ import print_function import numpy as np def save_dict(di_, filename_): with open(filename_, 'wb') as f: pickle.dump(di_, f) def load_dict(filename_): with open(filename_, 'rb') as f: ret_di = pickle.load(f) return ret_di if __name__ == '__main__': g_data = { 'm':np.random.rand(4,4), 'n':np.random.rand(2,2,2) } save_dict(g_data, './data.pkl') g_data2 = load_dict('./data.pkl') print(g_data['m'] == g_data2['m']) print(g_data['n'] == g_data2['n'])
You may also save multiple python objects in a single pickled file. Each pickle.load
call will load a single object in that case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With