Mnist dataset splitting

Question

can anyone help me out in splitting mnist dataset into training , testing and validation as per our wish of ratios.

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

Use 70-20-10 split for training, validation and testing.

Davide Anghileri · Accepted Answer

Assuming that you do not want to maintain the default split between train and test provided by tf.keras.datasets.mnist API you can add toghether train and test sets and then iteratively split them into train, val and test based on your ratios.

from sklearn.model_selection import train_test_split
import tensorflow as tf

DATASET_SIZE = 70000
TRAIN_RATIO = 0.7
VALIDATION_RATIO = 0.2
TEST_RATIO = 0.1

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

X = np.concatenate([x_train, x_test])
y = np.concatenate([y_train, y_test])

If you want the datasets to be numpy arrays you can use the sklearn.model_selection import train_test_split() function. Here an example:

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=(1-TRAIN_RATIO))
X_val, X_test, y_val, y_test = train_test_split(X_val, y_val, test_size=((TEST_RATIO/(VALIDATION_RATIO+TEST_RATIO))))

If you prefer to use the tf Dataset API then you can use the .take() and .skip() methods as follows:

dataset = tf.data.Dataset.from_tensor_slices((X, y))

train_dataset = dataset.take(int(TRAIN_RATIO*DATASET_SIZE))
validation_dataset = dataset.skip(int(TRAIN_RATIO*DATASET_SIZE)).take(int(VALIDATION_RATIO*DATASET_SIZE))
test_dataset = dataset.skip(int(TRAIN_RATIO*DATASET_SIZE)).skip(int(VALIDATION_RATIO*DATASET_SIZE))

Furthermore, you could add the .shuffle() to your dataset before the split to generate shuffled partitions:

dataset = dataset.shuffle()

Mnist dataset splitting

Tags:

machine-learning

tensorflow

keras

training-data

mnist

Taranjeet Singh

1 Answers

Davide Anghileri

Recent Activity

Donate For Us

Mnist dataset splitting

Tags:

machine-learning

tensorflow

keras

training-data

mnist

Taranjeet Singh

1 Answers

Davide Anghileri

Related questions

Recent Activity

Donate For Us