Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyTorch Dataloader: Dataset complete in RAM

I was wondering if the PyTorch Dataloader can also fetch the complete dataset into RAM so that performance does not suffer if there is enough RAM available

like image 755
pedrojose_moragallegos Avatar asked Sep 16 '25 22:09

pedrojose_moragallegos


2 Answers

A concrete example of previous answer:

class mydataset(torch.utils.data.Dataset):
    def __init__(self, data):
        self.data = data
    def __getitem__(self, index):
        return self.data['x'][index,:], self.data['y'][index,:]
    def __len__(self):
        return self.data['x'].shape[0]

torch_data_train = mydataset(data_train)
dataload_train = DataLoader(torch_data_train, batch_size=batch_size, shuffle=True, num_workers=2)
like image 82
tea_pea Avatar answered Sep 18 '25 19:09

tea_pea


You can extend torch.util.data.Dataset and create your own Dataset implementation. In the __init__ function of your custom dataset you can then load all data in a list or any other data structure, which will be fully loaded into ram. The __getitem__ will then only access the structure and return a single item.

like image 34
Deusy94 Avatar answered Sep 18 '25 17:09

Deusy94