PyTorch Dataloader: Dataset complete in RAM

Question

I was wondering if the PyTorch Dataloader can also fetch the complete dataset into RAM so that performance does not suffer if there is enough RAM available

tea_pea · Accepted Answer

A concrete example of previous answer:

class mydataset(torch.utils.data.Dataset):
    def __init__(self, data):
        self.data = data
    def __getitem__(self, index):
        return self.data['x'][index,:], self.data['y'][index,:]
    def __len__(self):
        return self.data['x'].shape[0]

torch_data_train = mydataset(data_train)
dataload_train = DataLoader(torch_data_train, batch_size=batch_size, shuffle=True, num_workers=2)

Deusy94 · Answer

You can extend torch.util.data.Dataset and create your own Dataset implementation. In the __init__ function of your custom dataset you can then load all data in a list or any other data structure, which will be fully loaded into ram. The __getitem__ will then only access the structure and return a single item.

PyTorch Dataloader: Dataset complete in RAM

Tags:

pytorch

pytorch-dataloader

pedrojose_moragallegos

2 Answers

tea_pea

Deusy94

Recent Activity

Donate For Us

PyTorch Dataloader: Dataset complete in RAM

Tags:

pytorch

pytorch-dataloader

pedrojose_moragallegos

2 Answers

tea_pea

Deusy94

Related questions

Recent Activity

Donate For Us