What are the best practices for training one neural net on more than one GPU on one machine?
I'm a little confused by the different options from nn.DataParallel
vs putting different layers on different GPUs with .to('cuda:0') and .to('cuda:1')
. I see in the Pytorch docs the latter method the date was 2017. Is there a standard or does it depend on preference or the type of model?
Method 1
class ToyModel(nn.Module):
def __init__(self):
super(ToyModel, self).__init__()
self.net1 = torch.nn.Linear(10, 10)
self.relu = torch.nn.ReLU()
self.net2 = torch.nn.Linear(10, 5)
def forward(self, x):
x = self.relu(self.net1(x))
return self.net2(x)
model = ToyModel().to('cuda')
model = nn.DataParallel(model)
Method 2
class ToyModel(nn.Module):
def __init__(self):
super(ToyModel, self).__init__()
self.net1 = torch.nn.Linear(10, 10).to('cuda:0')
self.relu = torch.nn.ReLU()
self.net2 = torch.nn.Linear(10, 5).to('cuda:1')
def forward(self, x):
x = self.relu(self.net1(x.to('cuda:0')))
return self.net2(x.to('cuda:1'))
I'm not sure there aren't more ways Pytorch provides to train on more than one GPU. Both of these methods seem to cause my system to freeze depending on what model I use them. In Jupyter the cell stays at a [*] and if I don't restart the kernel the screen freezes and I have to do a hard reset. A few tutorials on multi-gpu cause my system to hang and freeze like this.
If you cannot fit all the layers of your model on a single GPU, then you can use model parallel (that article describes model parallel on a single machine, with layer0.to('cuda:0')
and layer1.to('cuda:1')
like you mentioned).
If you can, then you can try distributed data parallel - each worker will hold its own copy of the entire model (all layers), and will work on a small portion of the data in each batch. DDP is recommended instead of DP, even if you only use a single machine.
Do you have some examples that can reproduce the issues you're having? Have you tried running your code with tiny inputs, and adding print statements to see whether progress is being made?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With