Taking subsets of a pytorch dataset

Tags:

I have a network which I want to train on some dataset (as an example, say CIFAR10). I can create data loader object via

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,                                         download=True, transform=transform) trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,                                           shuffle=True, num_workers=2)

My question is as follows: Suppose I want to make several different training iterations. Let's say I want at first to train the network on all images in odd positions, then on all images in even positions and so on. In order to do that, I need to be able to access to those images. Unfortunately, it seems that trainset does not allow such access. That is, trying to do trainset[:1000] or more generally trainset[mask] will throw an error.

I could do instead

trainset.train_data=trainset.train_data[mask] trainset.train_labels=trainset.train_labels[mask]

and then

trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,                                               shuffle=True, num_workers=2)

However, that will force me to create a new copy of the full dataset in each iteration (as I already changed trainset.train_data so I will need to redefine trainset). Is there some way to avoid it?

Ideally, I would like to have something "equivalent" to

trainloader = torch.utils.data.DataLoader(trainset[mask], batch_size=4,                                               shuffle=True, num_workers=2)

979

asked Nov 22 '17 10:11

Miriam Farber

1 Answers

torch.utils.data.Subset is easier, supports shuffle, and doesn't require writing your own sampler:

import torchvision import torch  trainset = torchvision.datasets.CIFAR10(root='./data', train=True,                                         download=True, transform=None)  evens = list(range(0, len(trainset), 2)) odds = list(range(1, len(trainset), 2)) trainset_1 = torch.utils.data.Subset(trainset, evens) trainset_2 = torch.utils.data.Subset(trainset, odds)  trainloader_1 = torch.utils.data.DataLoader(trainset_1, batch_size=4,                                             shuffle=True, num_workers=2) trainloader_2 = torch.utils.data.DataLoader(trainset_2, batch_size=4,                                             shuffle=True, num_workers=2)

159

answered Sep 17 '22 16:09

jayelm

Related questions
                            
                                How to connect MongoDB Compass using MLab connection string
                            
                                "Could not find or load main class java.se.ee" while running sdkmanager --licences
                            
                                Does Test Driven Development take the focus from Design? [closed]
                            
                                Should SQL Server be on the same machine as your IIS installation? [closed]
                            
                                Is there a way to prevent the hide operation of a toolbar?
                            
                                How to make the equivalent of a C-style if-else statement in Erlang?
                            
                                Iterate through items in an enumeration in Delphi
                            
                                jQuery UI sortable() - listitem jumps to top in Safari and Chrome
                            
                                Disable autocomplete for all jquery datepicker inputs
                            
                                How many classes can you inherit from in C#?
                            
                                Intercepting the back button
                            
                                Absolute position is not working

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With