Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mnist dataset splitting

can anyone help me out in splitting mnist dataset into training , testing and validation as per our wish of ratios.

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

Use 70-20-10 split for training, validation and testing.

like image 519
Taranjeet Singh Avatar asked Dec 01 '25 09:12

Taranjeet Singh


1 Answers

Assuming that you do not want to maintain the default split between train and test provided by tf.keras.datasets.mnist API you can add toghether train and test sets and then iteratively split them into train, val and test based on your ratios.

from sklearn.model_selection import train_test_split
import tensorflow as tf

DATASET_SIZE = 70000
TRAIN_RATIO = 0.7
VALIDATION_RATIO = 0.2
TEST_RATIO = 0.1

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

X = np.concatenate([x_train, x_test])
y = np.concatenate([y_train, y_test])

If you want the datasets to be numpy arrays you can use the sklearn.model_selection import train_test_split() function. Here an example:

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=(1-TRAIN_RATIO))
X_val, X_test, y_val, y_test = train_test_split(X_val, y_val, test_size=((TEST_RATIO/(VALIDATION_RATIO+TEST_RATIO))))

If you prefer to use the tf Dataset API then you can use the .take() and .skip() methods as follows:

dataset = tf.data.Dataset.from_tensor_slices((X, y))

train_dataset = dataset.take(int(TRAIN_RATIO*DATASET_SIZE))
validation_dataset = dataset.skip(int(TRAIN_RATIO*DATASET_SIZE)).take(int(VALIDATION_RATIO*DATASET_SIZE))
test_dataset = dataset.skip(int(TRAIN_RATIO*DATASET_SIZE)).skip(int(VALIDATION_RATIO*DATASET_SIZE))

Furthermore, you could add the .shuffle() to your dataset before the split to generate shuffled partitions:

dataset = dataset.shuffle()
like image 195
Davide Anghileri Avatar answered Dec 03 '25 23:12

Davide Anghileri



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!