Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pickle UnicodeDecodeError

I'm trying to load the mnist character dataset (following the tutorial outlined here: http://neuralnetworksanddeeplearning.com/chap1.html )

when I run the load_data_wrapper function I get the error.

UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)

The code run is:

import numpy as np
import gzip


def load_data():
    f = gzip.open('../data/mnist.pkl.gz', 'rb')
    training_data, validation_data, test_data = pickle.load(f)
    f.close()
    return (training_data, validation_data, test_data)

def load_data_wrapper():
    tr_d, va_d, te_d = load_data()
    training_inputs = [np.reshape(x, (784,1)) for x in tr_d[0]]
    training_results = [vectorized_result(y) for y in tr_d[1]]
    training_data = zip(training_inputs, training_results)
    validation_inputs = [np.reshape(x,(784, 1))for x in va_d[0]]
    validation_data = zip(validation_inputs, va_d[1])
    test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]]
    test_data = zip(test_inputs, te_d[1])
    return(training_data, validation_data, test_data)

def vectorized_result(j):
    e = np.zeros((10,1))
    e[j] = 1.0
    return e

UPDATE: The problem seems to be that I am trying to unpickle with python 3.6 which was pickled with python 2.x.

like image 588
Nisitha Jayatilleka Avatar asked Nov 08 '16 18:11

Nisitha Jayatilleka


People also ask

What encoding does pickle use?

Using encoding='latin1' is required for unpickling NumPy arrays and instances of datetime , date and time pickled by Python 2. If buffers is None (the default), then all data necessary for deserialization must be contained in the pickle stream.

How do I reduce the size of a pickle file?

Enter the bz2 library for python, which enables bz2 compression for any file. By sacrificing some of the speed gained by pickling your data, you can compress it to a quarter of its original size.

How do I read a Python pickle file?

The first step to unpickle a file is to load it back into a python program. Use the open() command to open the file with the 'rb' argument as it indicated to open the file in 'read' mode. The 'r' stands for reading mode, and 'b' stands for 'binary mode.


1 Answers

As stated the main problem turned out to be incompatibility between python 2.x cPickle and python 3.x pickle.

setting the encoding to 'latin-1' seems to work.

training_data, validation_data, test_data = pickle.load(f, encoding='latin1')

Answer here helped a lot: Pickle incompatability of numpy arrays between Python 2 and 3

like image 74
Nisitha Jayatilleka Avatar answered Sep 29 '22 16:09

Nisitha Jayatilleka