Change size of train and test set from MNIST Dataset

Question

I'm using the MNIST and Keras for learning about CNNs. I'm downloading the MNIST database of handwritten digits under Keras API as show below. The dataset is already split in 60.000 images for training and 10.000 images for test (see Dataset - Keras Documentation).

from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

How can I join the training and test sets and then separate them into 70% for training and 30% for testing?

Mikhail Stepanov · Accepted Answer

There's no such argument in mnist.load_data. Instead you can concatenate data via numpy then split via sklearn (or numpy):

from keras.datasets import mnist
import numpy as np
from sklearn.model_selection import train_test_split

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x = np.concatenate((x_train, x_test))
y = np.concatenate((y_train, y_test))

train_size = 0.7
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=train_size, random_seed=2019)

Set a random seed for a reproducibility.

Via numpy (if you don't use sklearn):

# do the same concatenation
np.random.seed(2019)
train_size = 0.7
index = np.random.rand(len(x)) < train_size  # boolean index
x_train, x_test = x[index], x[~index]  # index and it's negation
y_train, y_test = y[index], y[~index]

You'll get an arrays of approximately required size (~210xx instead of 21000 test size).

The source code of mnist.load_data looks like this function just fetches this data from a URL already split as 60000 / 10000 test, so there's only a concatenation workaround.

You could also download the MNIST dataset from http://yann.lecun.com/exdb/mnist/ and preprocess it manually, and then concatenate it (as you need). But, as far as I understand, it was divided into 60000 examples for training and 10000 for testing because this splitting is used in standard benchmarks.

Change size of train and test set from MNIST Dataset

Tags:

python

keras

mnist

Thulio Amorim

1 Answers

Mikhail Stepanov

Recent Activity

Donate For Us

Change size of train and test set from MNIST Dataset

Tags:

python

keras

mnist

Thulio Amorim

1 Answers

Mikhail Stepanov

Related questions

Recent Activity

Donate For Us