Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read serialized data by python2 cPikle with python3 pickle?

I'm trying to work with CIFAR-10 dataset which contains a special version for python.

It is a set of binary files, each representing a dictionary of 10k numpy matrices. The files were obviously created by python2 cPickle.

I tried to load it from python2 as follows:

import cPickle
with open("data/data_batch_1", "rb") as f:
    data = cPickle.load(f)

This works really great. However, if I try to load the data from python3 (that hasn't cPickle but pickle instead), it fails:

import pickle
with open("data/data_batch_1", "rb") as f:
    data = pickle.load(f)

If fails with the following error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)

Can I somehow transform the ofiginal dataset into new one that will be readable from python3? Or may I somehow read it from python3 direrctly?

I've tried loading it by cPickle, dumping it into json and reading it back by pickle, but numpy matrices obviously can't be written as a json file.

like image 430
petrbel Avatar asked Nov 22 '15 15:11

petrbel


1 Answers

You'll need to tell pickle what codec to use for those bytestrings, or tell it to load the data as bytes instead. From the pickle.load() documentation:

The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.

To load the strings as bytes objects that'd be:

import pickle
with open("data/data_batch_1", "rb") as f:
    data = pickle.load(f, encoding='bytes')
like image 135
Martijn Pieters Avatar answered Sep 20 '22 15:09

Martijn Pieters