I'm trying to load the mnist character dataset (following the tutorial outlined here: http://neuralnetworksanddeeplearning.com/chap1.html )
when I run the load_data_wrapper function I get the error.
UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)
The code run is:
import numpy as np
import gzip
def load_data():
f = gzip.open('../data/mnist.pkl.gz', 'rb')
training_data, validation_data, test_data = pickle.load(f)
f.close()
return (training_data, validation_data, test_data)
def load_data_wrapper():
tr_d, va_d, te_d = load_data()
training_inputs = [np.reshape(x, (784,1)) for x in tr_d[0]]
training_results = [vectorized_result(y) for y in tr_d[1]]
training_data = zip(training_inputs, training_results)
validation_inputs = [np.reshape(x,(784, 1))for x in va_d[0]]
validation_data = zip(validation_inputs, va_d[1])
test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]]
test_data = zip(test_inputs, te_d[1])
return(training_data, validation_data, test_data)
def vectorized_result(j):
e = np.zeros((10,1))
e[j] = 1.0
return e
UPDATE: The problem seems to be that I am trying to unpickle with python 3.6 which was pickled with python 2.x.
Using encoding='latin1' is required for unpickling NumPy arrays and instances of datetime , date and time pickled by Python 2. If buffers is None (the default), then all data necessary for deserialization must be contained in the pickle stream.
Enter the bz2 library for python, which enables bz2 compression for any file. By sacrificing some of the speed gained by pickling your data, you can compress it to a quarter of its original size.
The first step to unpickle a file is to load it back into a python program. Use the open() command to open the file with the 'rb' argument as it indicated to open the file in 'read' mode. The 'r' stands for reading mode, and 'b' stands for 'binary mode.
As stated the main problem turned out to be incompatibility between python 2.x cPickle and python 3.x pickle.
setting the encoding to 'latin-1' seems to work.
training_data, validation_data, test_data = pickle.load(f, encoding='latin1')
Answer here helped a lot: Pickle incompatability of numpy arrays between Python 2 and 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With