pytorch instance tensor not moved to gpu even with explicit cuda() call

Question

I'm working on a project where the model requires access to a tensor that i declare in the constructor init of the class (im sub-classing torch.nn.Module class) and then i need to use this tensor in the forward() method via a simple matmul() , the model is sent to gpu via a cuda() call:

model = Model()
model.cuda()

However when i do forward-propagation of a simple input X through:

model(X) # or model.forward(X)

I get

RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #2 'mat2'

Indicating that the second argument of matmul(the instance tensor i declared) is on CPU and it was expected on GPU (as the rest of the model and data).

In matmul, the tensor is transposed via matrix.t()

I even tried overriding the cuda() method thorugh:

def cuda(self):
    super().cuda()
    self.matrix.cuda()

The data is already in the GPU ,meaning the following line of code was already executed:

X = X.cuda()

Also the error explcitly says argument 2 of matmul which for this case is the tensor(called matrix) not X.

Vaisakh · Accepted Answer

Let's assume the following:

X is moved correctly to the GPU
The tensor declared in the Model class is a simple attribute.

i.e. Something like the following:

class Model(nn.Module):
   def __init__(self):
       super().__init__()
       self.matrix = torch.randn(784, 10)
       
   def forward(self, x):
       return torch.matmul(x, self.matrix)

If so, your first attempt wouldn't work because the nn.Module.cuda() method only moves all of the Parameters and Buffers to the GPU.

You would need to make Model.matrix a Parameter instead of regular attribute. Wrap it in the parameter class. Something like:

self.matrix = nn.Parameter(torch.randn(784, 10))

Now, instead of automatically casting to the GPU like above, you tried to manually call the .cuda() method on Model.matrix within the override.

This doesn't work either because of a subtle difference between the nn.Module.cuda() method and the torch.Tensor.cuda() method.

While nn.Module.cuda() moves all the Parameters and Buffers of the Module to GPU and returns itself, torch.Tensor.cuda() only returns a copy of the tensor on the GPU.

The original tensor is unaffected.

In summary, either:

Wrap your matrix attribute as a Parameter or
Assign the GPU copy back to matrix via:

self.matrix = self.matrix.cuda()

In your override.

I would suggest the first.

pytorch instance tensor not moved to gpu even with explicit cuda() call

Tags:

python

artificial-intelligence

machine-learning

pytorch

Luis Leal

1 Answers

Vaisakh

Recent Activity

Donate For Us

pytorch instance tensor not moved to gpu even with explicit cuda() call

Tags:

python

artificial-intelligence

machine-learning

pytorch

Luis Leal

1 Answers

Vaisakh

Related questions

Recent Activity

Donate For Us