Pytorch

Question

I am trying to load two datasets and use them both for training.

Package versions: python 3.7; pytorch 1.3.1

It is possible to create data_loaders seperately and train on them sequentially:

from torch.utils.data import DataLoader, ConcatDataset


train_loader_modelnet = DataLoader(ModelNet(args.modelnet_root, categories=args.modelnet_categories,split='train', transform=transform_modelnet, device=args.device),batch_size=args.batch_size, shuffle=True)

train_loader_mydata = DataLoader(MyDataset(args.customdata_root, categories=args.mydata_categories, split='train', device=args.device),batch_size=args.batch_size, shuffle=True)

for e in range(args.epochs):
    for idx, batch in enumerate(tqdm(train_loader_modelnet)):
        # training on dataset1
    for idx, batch in enumerate(tqdm(train_loader_custom)):
        # training on dataset2

Note: MyDataset is a custom dataset class which has def __len__(self): def __getitem__(self, index): implemented. As the above configuration works it seems that this is implementation is OK.

But I would ideally like to combine them into a single dataloader object. I attempted this as per the pytorch documentation:

train_modelnet = ModelNet(args.modelnet_root, categories=args.modelnet_categories,
                          split='train', transform=transform_modelnet, device=args.device)
train_mydata = CloudDataset(args.customdata_root, categories=args.mydata_categories,
                             split='train', device=args.device)
train_loader = torch.utils.data.ConcatDataset(train_modelnet, train_customdata)

for e in range(args.epochs):
    for idx, batch in enumerate(tqdm(train_loader)):
        # training on combined

However, on random batches I get the following 'expected a tensor as element X in argument 0, but got a tuple instead' type of error. Any help would be much appreciated!

>   40%|████      | 53/131 [01:03<02:00,  1.55s/it]
>  Traceback (mostrecent call last):   File
> "/home/chris/Programs/pycharm-anaconda-2019.3.4/plugins/python/helpers/pydev/pydevd.py",
> line 1434, in _exec
>     pydev_imports.execfile(file, globals, locals)  # execute the script   File
> "/home/chris/Programs/pycharm-anaconda-2019.3.4/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
>     exec(compile(contents+"
", file, 'exec'), glob, loc)   File "/home/chris/Documents/4yp/Data/my_kaolin/Classification/pointcloud_classification_combinedset.py",
> line 83, in <module>
>     for idx, batch in enumerate(tqdm(train_loader)):   File "/home/chris/anaconda3/envs/4YP/lib/python3.7/site-packages/tqdm/std.py",
> line 1107, in __iter__
>     for obj in iterable:   File "/home/chris/anaconda3/envs/4YP/lib/python3.7/site-packages/torch/utils/data/dataloader.py",
> line 346, in __next__
>     data = self._dataset_fetcher.fetch(index)  # may raise StopIteration   File
> "/home/chris/anaconda3/envs/4YP/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py",
> line 47, in fetch
>     return self.collate_fn(data)   File "/home/chris/anaconda3/envs/4YP/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py",
> line 79, in default_collate
>     return [default_collate(samples) for samples in transposed]   File "/home/chris/anaconda3/envs/4YP/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py",
> line 79, in <listcomp>
>     return [default_collate(samples) for samples in transposed]   File "/home/chris/anaconda3/envs/4YP/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py",
> line 55, in default_collate
>     return torch.stack(batch, 0, out=out) TypeError: expected Tensor as element 3 in argument 0, but got tuple

jvel07 · Accepted Answer

If I got your question right, you have train and dev sets (and their corresponding loaders) as follows:

train_set = CustomDataset(...)
train_loader = DataLoader(dataset=train_set, ...)
dev_set = CustomDataset(...)
dev_loader = DataLoader(dataset=dev_set, ...)

And you want to concatenate them in order to use train+dev as the training data, right? If so, you just simply call:

train_dev_sets = torch.utils.data.ConcatDataset([train_set, dev_set])
train_dev_loader = DataLoader(dataset=train_dev_sets, ...)

The train_dev_loader is the loader containing data from both sets.

Now, be sure your data has the same shapes and the same types, that is, the same number of features, or the same categories/numbers, etc.

Leopd · Answer

I'd guess the two datasets are sometimes returning different types. When the data are Tensors, torch stacks them, and they better be the same shape. If they're something like strings, torch will make a tuple out of them. So this sounds like one of your datasets is sometimes returning something that's not a tensor. I'd put some asserts on the output of your dataset to check that it's doing what you want, or dive in with pdb.

Pytorch - Concatenating Datasets before using Dataloader

Tags:

python

machine-learning

tensorflow

dataset

chrispduck

2 Answers

jvel07

Leopd

Recent Activity

Donate For Us