Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyTorch: How to use DataLoaders for custom Datasets

How to make use of the torch.utils.data.Dataset and torch.utils.data.DataLoader on your own data (not just the torchvision.datasets)?

Is there a way to use the inbuilt DataLoaders which they use on TorchVisionDatasets to be used on any dataset?

like image 869
Sarthak Avatar asked Jan 29 '17 18:01

Sarthak


People also ask

How do Dataloaders work PyTorch?

Data loader. Combines a dataset and a sampler, and provides an iterable over the given dataset. The DataLoader supports both map-style and iterable-style datasets with single- or multi-process loading, customizing loading order and optional automatic batching (collation) and memory pinning.


1 Answers

Yes, that is possible. Just create the objects by yourself, e.g.

import torch.utils.data as data_utils  train = data_utils.TensorDataset(features, targets) train_loader = data_utils.DataLoader(train, batch_size=50, shuffle=True) 

where features and targets are tensors. features has to be 2-D, i.e. a matrix where each line represents one training sample, and targets may be 1-D or 2-D, depending on whether you are trying to predict a scalar or a vector.

Hope that helps!


EDIT: response to @sarthak's question

Basically yes. If you create an object of type TensorData, then the constructor investigates whether the first dimensions of the feature tensor (which is actually called data_tensor) and the target tensor (called target_tensor) have the same length:

assert data_tensor.size(0) == target_tensor.size(0) 

However, if you want to feed these data into a neural network subsequently, then you need to be careful. While convolution layers work on data like yours, (I think) all of the other types of layers expect the data to be given in matrix form. So, if you run into an issue like this, then an easy solution would be to convert your 4D-dataset (given as some kind of tensor, e.g. FloatTensor) into a matrix by using the method view. For your 5000xnxnx3 dataset, this would look like this:

2d_dataset = 4d_dataset.view(5000, -1) 

(The value -1 tells PyTorch to figure out the length of the second dimension automatically.)

like image 189
pho7 Avatar answered Sep 23 '22 10:09

pho7