Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pytorch DataParallel doesn't work when the model contain tensor operation

If my model contains only nn.Module layers such as nn.Linear, nn.DataParallel works fine.

x = torch.randn(100,10)

class normal_model(torch.nn.Module):
    def __init__(self):
        super(normal_model, self).__init__()
        self.layer = torch.nn.Linear(10,1)

    def forward(self, x):
        return self.layer(x)

model = normal_model()
model = nn.DataParallel(model.to('cuda:0'))
model(x)

However, when my model contains a tensor operation such as the following

class custom_model(torch.nn.Module):
    def __init__(self):
        super(custom_model, self).__init__()
        self.layer = torch.nn.Linear(10,5)
        self.weight = torch.ones(5,1, device='cuda:0')
    def forward(self, x):
        return self.layer(x) @ self.weight

model = custom_model()
model = torch.nn.DataParallel(model.to('cuda:0'))
model(x) 

It gives me the following error

RuntimeError: Caught RuntimeError in replica 1 on device 1. Original Traceback (most recent call last): File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker output = module(*input, **kwargs) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "", line 7, in forward return self.layer(x) @ self.weight RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:277

How to avoid this error when we have some tensor operations in our model?

like image 583
Raven Cheuk Avatar asked Sep 01 '25 01:09

Raven Cheuk


2 Answers

I have no experience with DataParallel, but I think it might be because your tensor is not part of the model parameters. You can do this by writing:

torch.nn.Parameter(torch.ones(5,1))

Note that you don't have to move it to the gpu when initializing, because now when you call model.to('cuda:0') this is done automatically.

I can imagine that DataParallel uses the model parameters to move them to the appropriate gpu.

See this answer for more on the difference between a torch tensor and torch.nn.Parameter.

If you don't want the tensor values to be updated by backpropagation during training, you can add requires_grad=False.

Another way that might work is to override the to method, and initialize the tensor in the forward pass:

class custom_model(torch.nn.Module):
    def __init__(self):
        super(custom_model, self).__init__()
        self.layer = torch.nn.Linear(10,5)
    def forward(self, x):
        return self.layer(x) @ torch.ones(5,1, device=self.device)
    def to(self, device: str):
        new_self = super(custom_model, self).to(device)
        new_self.device = device
        return new_self

or something like this:

class custom_model(torch.nn.Module):
    def __init__(self, device:str):
        super(custom_model, self).__init__()
        self.layer = torch.nn.Linear(10,5)
        self.weight = torch.ones(5,1, device=device)
    def forward(self, x):
        return self.layer(x) @ self.weight
    def to(self, device: str):
        new_self = super(custom_model, self).to(device)
        new_self.device = device
        new_self.weight = torch.ones(5,1, device=device)
        return new_self
like image 139
Elgar de Groot Avatar answered Sep 02 '25 21:09

Elgar de Groot


Adding to the answer from @Elgar de Groot since OP also wanted to freeze that layer. To do so you can still use torch.nn.Parameter but then you explicitly set requires_grad to false like this:

self.layer = torch.nn.Parameter(torch.ones(5,1))
self.layer.requires_grad = False
like image 21
erpasd Avatar answered Sep 02 '25 21:09

erpasd