Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pickle incompatibility of numpy arrays between Python 2 and 3

I am trying to load the MNIST dataset linked here in Python 3.2 using this program:

import pickle
import gzip
import numpy


with gzip.open('mnist.pkl.gz', 'rb') as f:
    l = list(pickle.load(f))
    print(l)

Unfortunately, it gives me the error:

Traceback (most recent call last):
   File "mnist.py", line 7, in <module>
     train_set, valid_set, test_set = pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)

I then tried to decode the pickled file in Python 2.7, and re-encode it. So, I ran this program in Python 2.7:

import pickle
import gzip
import numpy


with gzip.open('mnist.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f)

    # Printing out the three objects reveals that they are
    # all pairs containing numpy arrays.

    with gzip.open('mnistx.pkl.gz', 'wb') as g:
        pickle.dump(
            (train_set, valid_set, test_set),
            g,
            protocol=2)  # I also tried protocol 0.

It ran without error, so I reran this program in Python 3.2:

import pickle
import gzip
import numpy

# note the filename change
with gzip.open('mnistx.pkl.gz', 'rb') as f:
    l = list(pickle.load(f))
    print(l)

However, it gave me the same error as before. How do I get this to work?


This is a better approach for loading the MNIST dataset.

like image 247
Neil G Avatar asked Jul 03 '12 06:07

Neil G


People also ask

Does pickle work across Python versions?

Python's pickle is perfectly cross-platform.

What Cannot be pickled in Python?

With pickle protocol v1, you cannot pickle open file objects, network connections, or database connections.

How do you use pickle in Python 3?

To use pickle, start by importing it in Python. To pickle this dictionary, you first need to specify the name of the file you will write it to, which is dogs in this case. Note that the file does not have an extension. To open the file for writing, simply use the open() function.

Is Python pickling slow?

Pickle on the other hand is slow, insecure, and can be only parsed in Python. The only real advantage to pickle is that it can serialize arbitrary Python objects, whereas both JSON and MessagePack have limits on the type of data they can write out.


6 Answers

If you are getting this error in python3, then, it could be an incompatibility issue between python 2 and python 3, for me the solution was to load with latin1 encoding:

pickle.load(file, encoding='latin1')
like image 27
Tshilidzi Mudau Avatar answered Sep 30 '22 20:09

Tshilidzi Mudau


This seems like some sort of incompatibility. It's trying to load a "binstring" object, which is assumed to be ASCII, while in this case it is binary data. If this is a bug in the Python 3 unpickler, or a "misuse" of the pickler by numpy, I don't know.

Here is something of a workaround, but I don't know how meaningful the data is at this point:

import pickle
import gzip
import numpy

with open('mnist.pkl', 'rb') as f:
    u = pickle._Unpickler(f)
    u.encoding = 'latin1'
    p = u.load()
    print(p)

Unpickling it in Python 2 and then repickling it is only going to create the same problem again, so you need to save it in another format.

like image 190
Lennart Regebro Avatar answered Sep 30 '22 19:09

Lennart Regebro


It appears to be an incompatibility issue between Python 2 and Python 3. I tried loading the MNIST dataset with

    train_set, valid_set, test_set = pickle.load(file, encoding='iso-8859-1')

and it worked for Python 3.5.2

like image 20
Steve Avatar answered Sep 30 '22 21:09

Steve


It looks like there are some compatablility issues in pickle between 2.x and 3.x due to the move to unicode. Your file appears to be pickled with python 2.x and decoding it in 3.x could be troublesome.

I'd suggest unpickling it with python 2.x and saving to a format that plays more nicely across the two versions you're using.

like image 38
John Lyon Avatar answered Sep 30 '22 21:09

John Lyon


I just stumbled upon this snippet. Hope this helps to clarify the compatibility issue.

import sys

with gzip.open('mnist.pkl.gz', 'rb') as f:
    if sys.version_info.major > 2:
        train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
    else:
        train_set, valid_set, test_set = pickle.load(f)
like image 37
serge Avatar answered Sep 30 '22 19:09

serge


Try:

l = list(pickle.load(f, encoding='bytes')) #if you are loading image data or 
l = list(pickle.load(f, encoding='latin1')) #if you are loading text data

From the documentation of pickle.load method:

Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2.

If fix_imports is True, pickle will try to map the old Python 2 names to the new names used in Python 3.

The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to 'ASCII' and 'strict', respectively. The encoding can be 'bytes' to read these 8-bit string instances as bytes objects.

like image 32
Manish Kumbhare Avatar answered Sep 30 '22 20:09

Manish Kumbhare