Pytorch DataLoader multiple data source

Question

I am trying to use Pytorch dataloader to define my own dataset, but I am not sure how to load multiple data source:

My current code:

class MultipleSourceDataSet(Dataset):
    def __init__ (self, json_file, root_dir, transform = None):
        with open(root_dir + 'block0.json') as f:
            self.result = torch.Tensor(json.load(f))

    self.root_dir = root_dir
    self.transform = transform

    def __len__(self):
        return len(self.result[0])

    def __getitem__ (self):
        None

The data source is 50 blocks under root_dir = ~/Documents/blocks/

I split them and avoid to combine them directly before since this is a very big dataset.

How can I load them into a single dataloader?

Shai · Accepted Answer

For DataLoader you need to have a single Dataset, your problem is that you have multiple 'json' files and you only know how to create a Dataset from each 'json' separately.
What you can do in this case is to use ConcatDataset that contains all the single-'json' datasets you create:

import os
import torch.utils.data as data

class SingeJsonDataset(data.Dataset):
    # implement a single json dataset here...

list_of_datasets = []
for j in os.path.listdir(root_dir):
    if not j.endswith('.json'):
        continue  # skip non-json files
    list_of_datasets.append(SingeJsonDataset(json_file=j, root_dir=root_dir, transform=None))
# once all single json datasets are created you can concat them into a single one:
multiple_json_dataset = data.ConcatDataset(list_of_datasets)

Now you can feed the concatenated dataset into data.DataLoader.

sealpuppy · Answer

I should revise my question as 2 different sub-questions:

How to deal with large datasets in PyTorch to avoid memory error
If I am separating large a dataset into small chunks, how can I load multiple mini-datasets

For question 1:

PyTorch DataLoader can prevent this issue by creating mini-batches. Here you can find further explanations.

For question 2:

Please refer to Shai's answer above.

Pytorch DataLoader multiple data source

Tags:

python-3.x

image-processing

machine-learning

deep-learning

pytorch

sealpuppy

2 Answers

Shai

sealpuppy

Recent Activity

Donate For Us

Pytorch DataLoader multiple data source

Tags:

python-3.x

image-processing

machine-learning

deep-learning

pytorch

sealpuppy

2 Answers

Shai

sealpuppy

Related questions

Recent Activity

Donate For Us