I'm trying to work with CIFAR-10 dataset which contains a special version for python.
It is a set of binary files, each representing a dictionary of 10k numpy matrices. The files were obviously created by python2 cPickle
.
I tried to load it from python2 as follows:
import cPickle
with open("data/data_batch_1", "rb") as f:
data = cPickle.load(f)
This works really great. However, if I try to load the data from python3 (that hasn't cPickle
but pickle
instead), it fails:
import pickle
with open("data/data_batch_1", "rb") as f:
data = pickle.load(f)
If fails with the following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)
Can I somehow transform the ofiginal dataset into new one that will be readable from python3? Or may I somehow read it from python3 direrctly?
I've tried loading it by cPickle
, dumping it into json
and reading it back by pickle
, but numpy matrices obviously can't be written as a json file.
You'll need to tell pickle what codec to use for those bytestrings, or tell it to load the data as bytes
instead. From the pickle.load()
documentation:
The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.
To load the strings as bytes
objects that'd be:
import pickle
with open("data/data_batch_1", "rb") as f:
data = pickle.load(f, encoding='bytes')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With