Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pytorch instance tensor not moved to gpu even with explicit cuda() call

I'm working on a project where the model requires access to a tensor that i declare in the constructor init of the class (im sub-classing torch.nn.Module class) and then i need to use this tensor in the forward() method via a simple matmul() , the model is sent to gpu via a cuda() call:

model = Model()
model.cuda()

However when i do forward-propagation of a simple input X through:

model(X) # or model.forward(X)

I get

RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #2 'mat2'

Indicating that the second argument of matmul(the instance tensor i declared) is on CPU and it was expected on GPU (as the rest of the model and data).

In matmul, the tensor is transposed via matrix.t()

I even tried overriding the cuda() method thorugh:

def cuda(self):
    super().cuda()
    self.matrix.cuda()

The data is already in the GPU ,meaning the following line of code was already executed:

X = X.cuda()

Also the error explcitly says argument 2 of matmul which for this case is the tensor(called matrix) not X.

like image 992
Luis Leal Avatar asked Dec 24 '22 01:12

Luis Leal


1 Answers

Let's assume the following:

  1. X is moved correctly to the GPU

  2. The tensor declared in the Model class is a simple attribute.

    i.e. Something like the following:

class Model(nn.Module):
   def __init__(self):
       super().__init__()
       self.matrix = torch.randn(784, 10)
       
   def forward(self, x):
       return torch.matmul(x, self.matrix)

If so, your first attempt wouldn't work because the nn.Module.cuda() method only moves all of the Parameters and Buffers to the GPU.

You would need to make Model.matrix a Parameter instead of regular attribute. Wrap it in the parameter class. Something like:

self.matrix = nn.Parameter(torch.randn(784, 10))

Now, instead of automatically casting to the GPU like above, you tried to manually call the .cuda() method on Model.matrix within the override.

This doesn't work either because of a subtle difference between the nn.Module.cuda() method and the torch.Tensor.cuda() method.

While nn.Module.cuda() moves all the Parameters and Buffers of the Module to GPU and returns itself, torch.Tensor.cuda() only returns a copy of the tensor on the GPU.

The original tensor is unaffected.


In summary, either:

  1. Wrap your matrix attribute as a Parameter or
  2. Assign the GPU copy back to matrix via:
self.matrix = self.matrix.cuda()

In your override.

I would suggest the first.

like image 59
Vaisakh Avatar answered Dec 25 '22 16:12

Vaisakh