Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to load a previously dumped pickle file of large size in Python

I used cPickle and protocol version 2 to dump some computation results. The code looks like this:

> f = open('foo.pck', 'w')
> cPickle.dump(var, f, protocol=2)
> f.close()

The variable var is a tuple of length two. The type of var[0] is a list and var[1] is a numpy.ndarray.

The above code segment successfully generated a file with large size (~1.7G).

However, when I tried to load the variable from foo.pck, I got the following error.

ValueError                                Traceback (most recent call last)
/home/user_account/tmp/<ipython-input-3-fd3ecce18dcd> in <module>()
----> 1 v = cPickle.load(f)
ValueError: buffer size does not match array size

The loading codes looks like the following.

> f= open('foo.pck', 'r')
> v = cPickle.load(f)

I also tried to use pickle (instead of cPickle) to load the variable, but got a similar error msg as follows.

ValueError                                Traceback (most recent call last)
/home/user_account/tmp/<ipython-input-3-aa6586c8e4bf> in <module>()
----> 1 v = pickle.load(f)

/usr/lib64/python2.6/pickle.pyc in load(file)
   1368 
   1369 def load(file):
-> 1370     return Unpickler(file).load()
   1371 
   1372 def loads(str):

/usr/lib64/python2.6/pickle.pyc in load(self)
    856             while 1:
    857                 key = read(1)
--> 858                 dispatch[key](self)
    859         except _Stop, stopinst:
    860             return stopinst.value

/usr/lib64/python2.6/pickle.pyc in load_build(self)
   1215         setstate = getattr(inst, "__setstate__", None)
   1216         if setstate:
-> 1217             setstate(state)
   1218             return
   1219         slotstate = None

ValueError: buffer size does not match array size

I tried the same code segments to a much smaller size data and it worked fine. So my best guess is that I reached the loading size limitation of pickle (or cPickle). However, it is strange to dump successfully (with large size variable) but failed to load.

If this is indeed a loading size limitation problem, how should I bypass it? If not, what can be the possible cause of the problem?

Any suggestion is appreciated. Thanks!

like image 715
user1036719 Avatar asked Aug 21 '12 18:08

user1036719


People also ask

How to read pickle file in Python?

The other method to read pickle file is using the pandas package. There is a read_pickle () function that allows you to read the file. The output will be dataframe. Use the below lines of code to read the pickle file. import pandas as pd df = pd.read_pickle ( "people.pkl" ) print (df)

How to reduce the size of pickle file?

Pickletools comes to rescue! The library helps you to reduce the size of the pickle file as well as makes loading the pickle file back as an RF object easy and faster.

How to perform pickling operation on binary files using Python?

dump () function: We use dump () method to perform pickling operation on our Binary Files. It returns the object representation in byte mode. The dump () method belongs to pickle module.

How long does it take to load a pickle file?

If you’re looking for faster loading, either function will work, it just depends on your space needs. Loading the.csv file took 2 seconds, loading the compressed pickle.pbz2 file took only 1.2 seconds, whereas loading the pickle files took a mere 0.15 seconds. Things to Try or Look Out For


1 Answers

How about save & load the numpy array by numpy.save() & np.load()?

You can save the pickled list and the numpy array to the same file:

import numpy as np
import cPickle
data = np.random.rand(50000000)
f = open('foo.pck', 'wb')
cPickle.dump([1,2,3], f, protocol=2)
np.save(f, data)
f.close()

to read the data:

import cPickle
import numpy as np
f= open('foo.pck', 'rb')
v = cPickle.load(f)
data = np.load(f)
print data.shape, data
like image 154
HYRY Avatar answered Oct 15 '22 21:10

HYRY