Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to put datasets created by torchvision.datasets in GPU in one operation?

I’m dealing with CIFAR10 and I use torchvision.datasets to create it. I’m in need of GPU to accelerate the calculation but I can’t find a way to put the whole dataset into GPU at one time. My model need to use mini-batches and it is really time-consuming to deal with each batch separately.

I've tried to put each mini-batch into GPU separately but it seems really time-consuming.

like image 586
OzymandiaSt Avatar asked Feb 02 '26 20:02

OzymandiaSt


1 Answers

TL;DR

You won't save time by moving the entire dataset at once.


I don't think you'd necessarily want to do that even if you have the GPU memory to handle the entire dataset (of course, CIFAR10 is tiny by today's standards).

I tried various batch sizes and timed the transfer to GPU as follows:

num_workers = 1 # Set this as needed

def time_gpu_cast(batch_size=1):
    start_time = time()
    for x, y in DataLoader(dataset, batch_size, num_workers=num_workers):
        x.cuda(); y.cuda()
    return time() - start_time

# Try various batch sizes
cast_times = [(2 ** bs, time_gpu_cast(2 ** bs)) for bs in range(15)]
# Try the entire dataset like you want to do
cast_times.append((len(dataset), time_gpu_cast(len(dataset))))

plot(*zip(*cast_times)) # Plot the time taken

For num_workers = 1, this is what I got: Serial Processing Cast Times

And if we try parallel loading (num_workers = 8), it becomes even clearer: enter image description here

like image 65
Vaisakh Avatar answered Feb 05 '26 09:02

Vaisakh