I have created a simple function for facerecognition by using the facerecognizer from OpenCV. It works all fine with images from people.
Now I would like to make a test by using handwritten characters instead of people. I came across MNIST dataset, but they store images in a weird file which I have never seen before.
I simply need to extract a few images from:
train-images.idx3-ubyte
and save them in a folder as .gif
Or am I missunderstand this MNIST thing. If yes where could I get such a dataset?
EDIT
I also have the gzip file:
train-images-idx3-ubyte.gz
I am trying to read the content, but show()
does not work and if I read()
I see random symbols.
images = gzip.open("train-images-idx3-ubyte.gz", 'rb')
print images.read()
EDIT
Managed to get some usefull output by using:
with gzip.open('train-images-idx3-ubyte.gz','r') as fin:
for line in fin:
print('got line', line)
Somehow I have to convert this now to an image, output:
Download the training/test images and labels:
And uncompress them in a workdir, say samples/
.
Get the python-mnist package from PyPi:
pip install python-mnist
Import the mnist
package and read the training/test images:
from mnist import MNIST
mndata = MNIST('samples')
images, labels = mndata.load_training()
# or
images, labels = mndata.load_testing()
To display an image to the console:
index = random.randrange(0, len(images)) # choose an index ;-)
print(mndata.display(images[index]))
You'll get something like this:
............................
............................
............................
............................
............................
.................@@.........
..............@@@@@.........
............@@@@............
..........@@................
..........@.................
...........@................
...........@................
...........@...@............
...........@@@@@.@..........
...........@@@...@@.........
...........@@.....@.........
..................@.........
..................@@........
..................@@........
..................@.........
.................@@.........
...........@.....@..........
...........@....@@..........
............@@@@............
.............@..............
............................
............................
............................
Explanation:
list
of unsigned bytes.array
of unsigned bytes.(Using only matplotlib, gzip and numpy)
Extract image data:
import gzip
f = gzip.open('train-images-idx3-ubyte.gz','r')
image_size = 28
num_images = 5
import numpy as np
f.read(16)
buf = f.read(image_size * image_size * num_images)
data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32)
data = data.reshape(num_images, image_size, image_size, 1)
Print images:
import matplotlib.pyplot as plt
image = np.asarray(data[2]).squeeze()
plt.imshow(image)
plt.show()
Print first 50 labels:
f = gzip.open('train-labels-idx1-ubyte.gz','r')
f.read(8)
for i in range(0,50):
buf = f.read(1)
labels = np.frombuffer(buf, dtype=np.uint8).astype(np.int64)
print(labels)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With