Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get single random example from PyTorch DataLoader

Tags:

python

pytorch

How do I get a single random example from a PyTorch DataLoader?

If my DataLoader gives minbatches of multiple images and labels, how do I get a single random image and label?

Note that I don't want a single image and label per minibatch, I want a total of one example.

like image 988
Tom Hale Avatar asked Dec 01 '18 12:12

Tom Hale


5 Answers

If your DataLoader is something like this:

test_loader = DataLoader(image_datasets['val'], batch_size=batch_size, shuffle=True)

it is giving you a batch of size batch_size, and you can pick out a single random example by directly indexing the batch:

for test_images, test_labels in test_loader:  
    sample_image = test_images[0]    # Reshape them according to your needs.
    sample_label = test_labels[0]

Alternative solutions

  1. You can use RandomSampler to obtain random samples.

  2. Use a batch_size of 1 in your DataLoader.

  3. Directly take samples from your DataSet like so:

     mnist_test = datasets.MNIST('../MNIST/', train=False, transform=transform)
    

    Now use this dataset to take samples:

     for image, label in mnist_test:
          # do something with image and other attributes
    
  4. (Probably the best) See here:

     inputs, classes = next(iter(dataloader))   
    
like image 50
parthagar Avatar answered Oct 04 '22 04:10

parthagar


If you want to choose specific images from your Trainloader/Testloader, you should check out the Subset function from master:

Here's an example how to use it:

testset = ImageFolderWithPaths(root="path/to/your/Image_Data/Test/", transform=transform)
subset_indices = [0] # select your indices here as a list
subset = torch.utils.data.Subset(testset, subset_indices)
testloader_subset = torch.utils.data.DataLoader(subset, batch_size=1, num_workers=0, shuffle=False)

This way you can use exactly one image and label. However, you can of course use more than just one index in your subset_indices.

If you want to use a specific image from your DataFolder, you can use dataset.sample and build a dictionary to get the index of the image you want to use.

like image 22
lschmidt90 Avatar answered Oct 04 '22 04:10

lschmidt90


(This answer is to supplement Alternative 3 of @parthagar's answer)

Iterating through dataset does not return "random" examples, you should instead use:

# Recovers the original `dataset` from the `dataloader`
dataset = dataloader.dataset
n_samples = len(dataset)

# Get a random sample
random_index = int(numpy.random.random()*n_samples)
single_example = dataset[random_index]
like image 9
johnnyasd12 Avatar answered Oct 04 '22 04:10

johnnyasd12


TL;DR:

The general form to get a single example from a DataLoader is:

list = [ x[0] for x in iter(trainloader).next() ]

In particular to the question asked, where minbatches of images and labels are returned:

image, label = [ x[0] for x in iter(trainloader).next() ]

Possibly interesting information:

To get a single minibatch from the DataLoader, use:

iter(trainloader).next()

When running something like for images, labels in dataloader: what happens under the hood is an iterator is created via iter(dataloader), then the iterator's .next() is called on each loop execution.


To get a single image from a DataLoader, which returns images and labels use:

image = iter(trainloader).next()[0][0]

This is the same as doing:

images, labels = iter(trainloader).next()
image = images[0]
like image 8
Tom Hale Avatar answered Oct 04 '22 03:10

Tom Hale


Random sample from DataLoader

Assuming DataLoader(shuffle=True) was used in its construction, a single random example can be drawn from the DataLoader with:

example = next(iter(dataloader))[0]

Random sample from Dataset

If that is not the case, you can draw a single random example from the Dataset with:

idx = torch.randint(len(dataset), (1,))
example = dataset[idx]
like image 2
iacob Avatar answered Oct 04 '22 02:10

iacob