I was wondering if the PyTorch Dataloader can also fetch the complete dataset into RAM so that performance does not suffer if there is enough RAM available
A concrete example of previous answer:
class mydataset(torch.utils.data.Dataset):
def __init__(self, data):
self.data = data
def __getitem__(self, index):
return self.data['x'][index,:], self.data['y'][index,:]
def __len__(self):
return self.data['x'].shape[0]
torch_data_train = mydataset(data_train)
dataload_train = DataLoader(torch_data_train, batch_size=batch_size, shuffle=True, num_workers=2)
You can extend torch.util.data.Dataset and create your own Dataset implementation. In the __init__
function of your custom dataset you can then load all data in a list or any other data structure, which will be fully loaded into ram. The __getitem__
will then only access the structure and return a single item.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With