I used cPickle and protocol version 2 to dump some computation results. The code looks like this:
> f = open('foo.pck', 'w')
> cPickle.dump(var, f, protocol=2)
> f.close()
The variable var is a tuple of length two. The type of var[0] is a list and var[1] is a numpy.ndarray.
The above code segment successfully generated a file with large size (~1.7G).
However, when I tried to load the variable from foo.pck, I got the following error.
ValueError Traceback (most recent call last)
/home/user_account/tmp/<ipython-input-3-fd3ecce18dcd> in <module>()
----> 1 v = cPickle.load(f)
ValueError: buffer size does not match array size
The loading codes looks like the following.
> f= open('foo.pck', 'r')
> v = cPickle.load(f)
I also tried to use pickle (instead of cPickle) to load the variable, but got a similar error msg as follows.
ValueError Traceback (most recent call last)
/home/user_account/tmp/<ipython-input-3-aa6586c8e4bf> in <module>()
----> 1 v = pickle.load(f)
/usr/lib64/python2.6/pickle.pyc in load(file)
1368
1369 def load(file):
-> 1370 return Unpickler(file).load()
1371
1372 def loads(str):
/usr/lib64/python2.6/pickle.pyc in load(self)
856 while 1:
857 key = read(1)
--> 858 dispatch[key](self)
859 except _Stop, stopinst:
860 return stopinst.value
/usr/lib64/python2.6/pickle.pyc in load_build(self)
1215 setstate = getattr(inst, "__setstate__", None)
1216 if setstate:
-> 1217 setstate(state)
1218 return
1219 slotstate = None
ValueError: buffer size does not match array size
I tried the same code segments to a much smaller size data and it worked fine. So my best guess is that I reached the loading size limitation of pickle (or cPickle). However, it is strange to dump successfully (with large size variable) but failed to load.
If this is indeed a loading size limitation problem, how should I bypass it? If not, what can be the possible cause of the problem?
Any suggestion is appreciated. Thanks!
The other method to read pickle file is using the pandas package. There is a read_pickle () function that allows you to read the file. The output will be dataframe. Use the below lines of code to read the pickle file. import pandas as pd df = pd.read_pickle ( "people.pkl" ) print (df)
Pickletools comes to rescue! The library helps you to reduce the size of the pickle file as well as makes loading the pickle file back as an RF object easy and faster.
dump () function: We use dump () method to perform pickling operation on our Binary Files. It returns the object representation in byte mode. The dump () method belongs to pickle module.
If you’re looking for faster loading, either function will work, it just depends on your space needs. Loading the.csv file took 2 seconds, loading the compressed pickle.pbz2 file took only 1.2 seconds, whereas loading the pickle files took a mere 0.15 seconds. Things to Try or Look Out For
How about save & load the numpy array by numpy.save()
& np.load()
?
You can save the pickled list and the numpy array to the same file:
import numpy as np
import cPickle
data = np.random.rand(50000000)
f = open('foo.pck', 'wb')
cPickle.dump([1,2,3], f, protocol=2)
np.save(f, data)
f.close()
to read the data:
import cPickle
import numpy as np
f= open('foo.pck', 'rb')
v = cPickle.load(f)
data = np.load(f)
print data.shape, data
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With