can anyone help me out in splitting mnist dataset into training , testing and validation as per our wish of ratios.
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
Use 70-20-10 split for training, validation and testing.
Assuming that you do not want to maintain the default split between train and test provided by tf.keras.datasets.mnist API you can add toghether train and test sets and then iteratively split them into train, val and test based on your ratios.
from sklearn.model_selection import train_test_split
import tensorflow as tf
DATASET_SIZE = 70000
TRAIN_RATIO = 0.7
VALIDATION_RATIO = 0.2
TEST_RATIO = 0.1
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
X = np.concatenate([x_train, x_test])
y = np.concatenate([y_train, y_test])
If you want the datasets to be numpy arrays you can use the sklearn.model_selection import train_test_split() function.
Here an example:
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=(1-TRAIN_RATIO))
X_val, X_test, y_val, y_test = train_test_split(X_val, y_val, test_size=((TEST_RATIO/(VALIDATION_RATIO+TEST_RATIO))))
If you prefer to use the tf Dataset API then you can use the .take() and .skip() methods as follows:
dataset = tf.data.Dataset.from_tensor_slices((X, y))
train_dataset = dataset.take(int(TRAIN_RATIO*DATASET_SIZE))
validation_dataset = dataset.skip(int(TRAIN_RATIO*DATASET_SIZE)).take(int(VALIDATION_RATIO*DATASET_SIZE))
test_dataset = dataset.skip(int(TRAIN_RATIO*DATASET_SIZE)).skip(int(VALIDATION_RATIO*DATASET_SIZE))
Furthermore, you could add the .shuffle() to your dataset before the split to generate shuffled partitions:
dataset = dataset.shuffle()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With