I use python 3.4 with win 7 64-bit system. I ran the following code:
6 """ load single batch of cifar """
7 with open(filename, 'r') as f:
----> 8 datadict = pickle.load(f)
9 X = datadict['data']
The wrong message is UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 0: illegal multibyte sequence
I changed the line 7 as:
6 """ load single batch of cifar """
7 with open(filename, 'r',encoding='utf-8') as f:
----> 8 datadict = pickle.load(f)
9 X = datadict['data']
The wrong message became UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
.
The message finally points to the Python34\lib\codecs.py in decode(self, input, final).
311 # decode input (taking the buffer into account)
312 data = self.buffer + input
--> 313 (result, consumed) = self._buffer_decode(data, self.errors, final)
314 # keep undecoded input until the next call
315 self.buffer = data[consumed:]
I further changed the code as:
6 """ load single batch of cifar """
7 with open(filename, 'rb') as f:
----> 8 datadict = pickle.load(f)
9 X = datadict['data'] 10 Y = datadict['labels']
Well, this time is UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)
.
What is the problem and how to solve it?
Pickle files are binary data files, so you always have to open the file with the 'rb'
mode when loading. Don't try to use a text mode here.
You are trying to load a Python 2 pickle that contains string data. You'll have to tell pickle.load()
how to convert that data to Python 3 strings, or to leave them as bytes.
The default is to try and decode those strings as ASCII, and that decoding fails. See the pickle.load()
documentation:
Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.
Setting the encoding to latin1
allows you to import the data directly:
with open(filename, 'rb') as f:
datadict = pickle.load(f, encoding='latin1')
It appears that it is the numpy
array data that is causing the problems here as all strings in the set use ASCII characters only.
The alternative would by to use encoding='bytes'
but then all the filenames and top-level dictionary keys are bytes
objects and you'd have to decode those or prefix all your key literals with b
.
if you will open file with utf-8,then you need write:
open(file_name, 'r', encoding='UTF-8')
if you will open file with GBK,then you need do:
open(file_name, 'rb')
hope to solve your problem!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With