Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python : save dictionaries through numpy.save [duplicate]

I have a large data set (millions of rows) in memory, in the form of numpy arrays and dictionaries.

Once this data is constructed I want to store them into files; so, later I can load these files into memory quickly, without reconstructing this data from the scratch once again.

np.save and np.load functions does the job smoothly for numpy arrays.
But I am facing problems with dict objects.

See below sample. d2 is the dictionary which was loaded from the file. See #out[28] it has been loaded into d2 as a numpy array, not as a dict. So further dict operations such as get are not working.

Is there a way to load the data from the file as dict (instead of numpy array) ?

In [25]: d1={'key1':[5,10], 'key2':[50,100]}  In [26]: np.save("d1.npy", d1)  In [27]: d2=np.load("d1.npy")  In [28]: d2 Out[28]: array({'key2': [50, 100], 'key1': [5, 10]}, dtype=object)  In [30]: d1.get('key1')  #original dict before saving into file Out[30]: [5, 10]  In [31]: d2.get('key2')  #dictionary loaded from the file --------------------------------------------------------------------------- AttributeError                            Traceback (most recent call last) <ipython-input-31-23e02e45bf22> in <module>() ----> 1 d2.get('key2')  AttributeError: 'numpy.ndarray' object has no attribute 'get' 
like image 774
ramu Avatar asked Oct 24 '16 13:10

ramu


2 Answers

It's a structured array. Use d2.item() to retrieve the actual dict object first:

import numpy as np  d1={'key1':[5,10], 'key2':[50,100]} np.save("d1.npy", d1) d2=np.load("d1.npy") print d1.get('key1') print d2.item().get('key2') 

result:

[5, 10] [50, 100] 
like image 146
Kennet Celeste Avatar answered Sep 25 '22 18:09

Kennet Celeste


pickle module can be used. Example code:

from six.moves import cPickle as pickle #for performance from __future__ import print_function import numpy as np  def save_dict(di_, filename_):     with open(filename_, 'wb') as f:         pickle.dump(di_, f)  def load_dict(filename_):     with open(filename_, 'rb') as f:         ret_di = pickle.load(f)     return ret_di  if __name__ == '__main__':     g_data = {         'm':np.random.rand(4,4),         'n':np.random.rand(2,2,2)     }     save_dict(g_data, './data.pkl')     g_data2 = load_dict('./data.pkl')     print(g_data['m'] == g_data2['m'])     print(g_data['n'] == g_data2['n']) 

You may also save multiple python objects in a single pickled file. Each pickle.load call will load a single object in that case.

like image 29
Kh40tiK Avatar answered Sep 22 '22 18:09

Kh40tiK