I am new to Python. I am adapting someone else's code from Python 2.X to 3.5. The code loads a file via cPickle. I changed all "cPickle" occurrences to "pickle" as I understand pickle superceded cPickle in 3.5. I get this execution error:
NameError: name 'cPickle' is not defined
Pertinent code:
import pickle
import gzip
...
def load_data():
f = gzip.open('../data/mnist.pkl.gz', 'rb')
training_data, validation_data, test_data = pickle.load(f, fix_imports=True)
f.close()
return (training_data, validation_data, test_data)
The error occurs in the pickle.load
line when load_data()
is called by another function. However, a) neither cPickle
or cpickle
no longer appear in any source files anywhere in the project (searched globally) and b) the error does not occur if I run the lines within load_data()
individually in the Python shell (however, I do get another data format error). Is pickle
calling cPickle
, and if so how do I stop it?
Shell: Python 3.5.0 |Anaconda 2.4.0 (x86_64)| (default, Oct 20 2015, 14:39:26) [GCC 4.2.1 (Apple Inc. build 5577)] on darwin
IDE: IntelliJ 15.0.1, Python 3.5.0, anaconda
Unclear how to proceed. Any help appreciated. Thanks.
The pickle data format is standardized, so strings serialized with pickle can be deserialized with cPickle and vice versa. The main difference between cPickle and pickle is performance. The cPickle module is many times faster to execute because it's written in C and because its methods are functions instead of classes.
In Python 2, cPickle is the accelerated version of pickle, and a later addition to the standard library. It was perfectly normal to import it with a fallback to pickle. In Python 3 the accelerated version has been integrated and there is simply no reason to use anything other than import pickle .
The cPickle module helps us by implementing an algorithm for turning an arbitrary python object into a series of Bytes. The Pickle module also carries a similar type of program. But the difference between the 2 is that cPickle is much faster and implements the algorithm in C.
Pickle on the other hand is slow, insecure, and can be only parsed in Python. The only real advantage to pickle is that it can serialize arbitrary Python objects, whereas both JSON and MessagePack have limits on the type of data they can write out.
Actually, if you have pickled objects from python2.x
, then generally can be read by python3.x
. Also, if you have pickled objects from python3.x
, they generally can be read by python2.x
, but only if they were dumped with a protocol
set to 2
or less.
Python 2.7.10 (default, Sep 2 2015, 17:36:25)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> x = [1,2,3,4,5]
>>> import math
>>> y = math.sin
>>>
>>> import pickle
>>> f = open('foo.pik', 'w')
>>> pickle.dump(x, f)
>>> pickle.dump(y, f)
>>> f.close()
>>>
dude@hilbert>$ python3.5
Python 3.5.0 (default, Sep 15 2015, 23:57:10)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('foo.pik', 'rb') as f:
... x = pickle.load(f)
... y = pickle.load(f)
...
>>> x
[1, 2, 3, 4, 5]
>>> y
<built-in function sin>
Also, if you are looking for cPickle
, it's now _pickle
, not pickle
.
>>> import _pickle
>>> _pickle
<module '_pickle' from '/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/lib-dynload/_pickle.cpython-35m-darwin.so'>
>>>
You also asked how to stop pickle
from using the built-in (C++) version. You can do this by using _dump
and _load
, or the _Pickler
class if you like to work with the class objects. Confused? The old cPickle
is now _pickle
, however dump
, load
, dumps
, and loads
all point to _pickle
… while _dump
, _load
, _dumps
, and _loads
point to the pure python version. For instance:
>>> import pickle
>>> # _dumps is a python function
>>> pickle._dumps
<function _dumps at 0x109c836a8>
>>> # dumps is a built-in (C++)
>>> pickle.dumps
<built-in function dumps>
>>> # the Pickler points to _pickle (C++)
>>> pickle.Pickler
<class '_pickle.Pickler'>
>>> # the _Pickler points to pickle (pure python)
>>> pickle._Pickler
<class 'pickle._Pickler'>
>>>
So if you don't want to use the built-in version, then you can use pickle._loads
and the like.
It's looking like the pickled data that you're trying to load was generated by a version of the program that was running on Python 2.7. The data is what contains the references to cPickle
.
The problem is that Pickle, as a serialization format, assumes that your standard library (and to a lesser extent your code) won't change layout between serialization and deserialization. Which it did -- a lot -- between Python 2 and 3. And when that happens, Pickle has no path for migration.
Do you have access to the program that generated mnist.pkl.gz
? If so, port it to Python 3 and re-run it to regenerate a Python 3-compatible version of the file.
If not, you'll have to write a Python 2 program that loads that file and exports it to a format that can be loaded from Python 3 (depending on the shape of your data, JSON and CSV are popular choices), then write a Python 3 program that loads that format then dumps it as Python 3 pickle. You can then load that Pickle file from your original program.
Of course, what you should really do is stop at the point where you have ability to load the exported format from Python 3 -- and use the aforementioned format as your actual, long-term storage format.
Using Pickle for anything other than short-term serialization between trusted programs (loading Pickle is equivalent to running arbitrary code in your Python VM) is something you should actively avoid, among other things because of the exact case you find yourself in.
In Anaconda Python3.5 : one can access cPickle as
import _pickle as cPickle
credits to Mike McKerns
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With