I'm running into this bizarre problem where when I run
  from tensorflow.examples.tutorials.mnist import input_data
  mnist = input_data.read_data_sets('/home/fqiao/development/MNIST_data/', one_hot=True)
I get:
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/examples/tutorials/mnist/input_data.py", line 199, in read_data_sets
    train_images = extract_images(local_file)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/examples/tutorials/mnist/input_data.py", line 58, in extract_images
    magic = _read32(bytestream)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/examples/tutorials/mnist/input_data.py", line 51, in _read32
    return numpy.frombuffer(bytestream.read(4), dtype=dt)[0]
  File "/usr/lib/python3.5/gzip.py", line 274, in read
    return self._buffer.read(size)
  File "/usr/lib/python3.5/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/usr/lib/python3.5/gzip.py", line 461, in read
    if not self._read_gzip_header():
  File "/usr/lib/python3.5/gzip.py", line 404, in _read_gzip_header
    magic = self._fp.read(2)
  File "/usr/lib/python3.5/gzip.py", line 91, in read
    self.file.read(size-self._length+read)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/default/_gfile.py", line 45, in sync
    return fn(self, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/default/_gfile.py", line 199, in read
    return self._fp.read(n)
  File "/usr/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
However, if I just run the code in input_data.py directly, everything appears to be fine:
>>> dt = numpy.dtype(numpy.uint32).newbyteorder('>')
>>> f = tf.gfile.Open('/home/fqiao/development/MNIST_data/train-images-idx3-ubyte.gz', 'rb')
>>> bytestream = gzip.GzipFile(fileobj=f)
>>> testbytes = numpy.frombuffer(bytestream.read(4), dtype=dt)[0]
>>> testbytes
2051
Anyone has any idea what's going on?
My system: Ubuntu 15.10 x64 python 3.5.0.
The bug has been addressed by a recent change 555e73d. MNIST files need to be opened with binary 'rb' mode instead of just text 'r'.
In my case, the problem was in the encoding of the data file.
Open the file using vim and execute:
:set fileencoding=utf-8
That solved the issue in my case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With