I have a simple neural network model and I apply either cuda()
or DataParallel()
on the model like following.
model = torch.nn.DataParallel(model).cuda()
OR,
model = model.cuda()
When I don't use DataParallel, rather simply transform my model to cuda()
, I need to explicitly convert the batch inputs to cuda()
and then give it to the model, otherwise it returns the following error.
torch.index_select received an invalid combination of arguments - got (torch.cuda.FloatTensor, int, torch.LongTensor)
But with DataParallel, the code works fine. Rest of the other things are same. Why this happens? Why when I use DataParallel, I don't need to transform the batch inputs explicitly to cuda()
?
Comparison between DataParallel and DistributedDataParallel First, DataParallel is single-process, multi-thread, and only works on a single machine, while DistributedDataParallel is multi-process and works for both single- and multi- machine training.
This container parallelizes the application of the given module by splitting the input across the specified devices by chunking in the batch dimension (other objects will be copied once per device).
There are three main ways to use PyTorch with multiple GPUs. These are: Data parallelism—datasets are broken into subsets which are processed in batches on different GPUs using the same model. The results are then combined and averaged in one version of the model.
Distributed Data Parallel in PyTorchDDP uses multiprocessing instead of threading and executes propagation through the model as a different process for each GPU. DDP duplicates the model across multiple GPUs, each of which is controlled by one process. A process here can be called a script that runs on your system.
Because, DataParallel allows CPU inputs, as it's first step is to transfer inputs to appropriate GPUs.
Info source: https://discuss.pytorch.org/t/cuda-vs-dataparallel-why-the-difference/4062/3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With