Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA vs. DataParallel: Why the difference?

Tags:

pytorch

I have a simple neural network model and I apply either cuda() or DataParallel() on the model like following.

model = torch.nn.DataParallel(model).cuda()

OR,

model = model.cuda()

When I don't use DataParallel, rather simply transform my model to cuda(), I need to explicitly convert the batch inputs to cuda() and then give it to the model, otherwise it returns the following error.

torch.index_select received an invalid combination of arguments - got (torch.cuda.FloatTensor, int, torch.LongTensor)

But with DataParallel, the code works fine. Rest of the other things are same. Why this happens? Why when I use DataParallel, I don't need to transform the batch inputs explicitly to cuda()?

like image 908
Wasi Ahmad Avatar asked Jun 16 '17 03:06

Wasi Ahmad


People also ask

What is the difference between DataParallel and DistributedDataParallel?

Comparison between DataParallel and DistributedDataParallel First, DataParallel is single-process, multi-thread, and only works on a single machine, while DistributedDataParallel is multi-process and works for both single- and multi- machine training.

What is DataParallel PyTorch?

This container parallelizes the application of the given module by splitting the input across the specified devices by chunking in the batch dimension (other objects will be copied once per device).

Does PyTorch use multiple GPUs?

There are three main ways to use PyTorch with multiple GPUs. These are: Data parallelism—datasets are broken into subsets which are processed in batches on different GPUs using the same model. The results are then combined and averaged in one version of the model.

How does PyTorch parallel work?

Distributed Data Parallel in PyTorchDDP uses multiprocessing instead of threading and executes propagation through the model as a different process for each GPU. DDP duplicates the model across multiple GPUs, each of which is controlled by one process. A process here can be called a script that runs on your system.


1 Answers

Because, DataParallel allows CPU inputs, as it's first step is to transfer inputs to appropriate GPUs.

Info source: https://discuss.pytorch.org/t/cuda-vs-dataparallel-why-the-difference/4062/3

like image 117
Wasi Ahmad Avatar answered Nov 15 '22 09:11

Wasi Ahmad