Validation dataset in PyTorch using DataLoaders

Tags:

pytorch

I want to load MNIST dataset in PyTorch and Torchvision, dividing it into train, validation and test parts. So far I have:

def load_dataset():
    train_loader = torch.utils.data.DataLoader(
        torchvision.datasets.MNIST(
            '/data/', train=True, download=True,
            transform=torchvision.transforms.Compose([
                torchvision.transforms.ToTensor()])),
        batch_size=batch_size_train, shuffle=True)

    test_loader = torch.utils.data.DataLoader(
        torchvision.datasets.MNIST(
            '/data/', train=False, download=True,
            transform=torchvision.transforms.Compose([
                torchvision.transforms.ToTensor()])),
        batch_size=batch_size_test, shuffle=True)

How can I divide the training dataset into training and validation if it's in the DataLoader? I want to use last 10000 examples from the training dataset as a validation dataset (I know that I should do CV for more accurate results, I just want a quick validation here).

310

asked Sep 27 '20 19:09

qalis

Video Answer

2 Answers

Splitting the training dataset into training and validation in PyTorch turns out to be much harder than it should be.

First, split the training set into training and validation subsets (class Subset), which are not datasets (class Dataset):

train_subset, val_subset = torch.utils.data.random_split(
        train, [50000, 10000], generator=torch.Generator().manual_seed(1))

Then get actual data from those datasets:

X_train = train_subset.dataset.data[train_subset.indices]
y_train = train_subset.dataset.targets[train_subset.indices]

X_val = val_subset.dataset.data[val_subset.indices]
y_val = val_subset.dataset.targets[val_subset.indices]

Note that this way we don't have Dataset objects, so we can't use DataLoader objects for batch training. If you want to use DataLoaders, they work directly with Subsets:

train_loader = DataLoader(dataset=train_subset, shuffle=True, batch_size=BATCH_SIZE)
val_loader = DataLoader(dataset=val_subset, shuffle=False, batch_size=BATCH_SIZE)

141

answered Oct 08 '22 16:10

qalis

If yo'd like to ensure your splits have balanced classes, you can use train_test_split from sklearn.

import torchvision
from torch.utils.data import DataLoader, Subset
from sklearn.model_selection import train_test_split

VAL_SIZE = 0.1
BATCH_SIZE = 64

mnist_train = torchvision.datasets.MNIST(
    '/data/',
    train=True,
    download=True,
    transform=torchvision.transforms.Compose([torchvision.transforms.ToTensor()])
)
mnist_test = torchvision.datasets.MNIST(
    '/data/',
    train=False,
    download=True,
    transform=torchvision.transforms.Compose([torchvision.transforms.ToTensor()])
)

# generate indices: instead of the actual data we pass in integers instead
train_indices, val_indices, _, _ = train_test_split(
    range(len(mnist_train)),
    mnist_train.targets,
    stratify=mnist_train.targets,
    test_size=VAL_SIZE,
)

# generate subset based on indices
train_split = Subset(mnist_train, train_indices)
val_split = Subset(mnist_train, val_indices)

# create batches
train_batches = DataLoader(train_split, batch_size=BATCH_SIZE, shuffle=True)
val_batches = DataLoader(val_split, batch_size=BATCH_SIZE, shuffle=True)
test_batches = DataLoader(mnist_test, batch_size=BATCH_SIZE, shuffle=True)

answered Oct 08 '22 16:10

Eric

Related questions
                            
                                How to implement multi-class hinge loss in tensorflow
                            
                                Should I avoid to use L2 regularization in conjuntion with RMSProp?
                            
                                Why do I have to do two train steps for fine-tuning InceptionV3 in Keras?
                            
                                how to predict my own image using cnn in keras after training on MNIST dataset
                            
                                Keras - get weight of trained layer
                            
                                Keras: model accuracy drops after reaching 99 percent accuracy and loss 0.01
                            
                                Approximating sine function with Neural Network and ReLU
                            
                                Training in batches but testing individual data item in Tensorflow?
                            
                                Imbalanced Dataset Using Keras
                            
                                How to overcome overfitting in CNN - standard methods don't work
                            
                                Mini batch training for inputs of variable sizes
                            
                                How to get summary information on tensorflow RNN
                            
                                How to feed sound as input to neural networks? [closed]
                            
                                What is the ideal value of loss function for a GAN
                            
                                Calculate face_descriptor faster
                            
                                What is the purpose of keras utils normalize?
                            
                                How does a Neural Network "remember" what its learned?
                            
                                Custom Hebbian Layer Implementation in Keras - input/output dims and lateral node connections
                            
                                What is the difference between conv1d with kernel_size=1 and dense layer?
                            
                                How to see the loss of the best epoch from early stopping in Keras?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Validation dataset in PyTorch using DataLoaders

Tags:

neural-network

pytorch

qalis

People also ask

Video Answer

2 Answers

qalis

Eric

Recent Activity

Donate For Us