I'm working on a project where the model requires access to a tensor that i declare in the constructor init of the class (im sub-classing torch.nn.Module class) and then i need to use this tensor in the forward() method via a simple matmul() , the model is sent to gpu via a cuda() call:
model = Model()
model.cuda()
However when i do forward-propagation of a simple input X through:
model(X) # or model.forward(X)
I get
RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #2 'mat2'
Indicating that the second argument of matmul(the instance tensor i declared) is on CPU and it was expected on GPU (as the rest of the model and data).
In matmul, the tensor is transposed via matrix.t()
I even tried overriding the cuda() method thorugh:
def cuda(self):
super().cuda()
self.matrix.cuda()
The data is already in the GPU ,meaning the following line of code was already executed:
X = X.cuda()
Also the error explcitly says argument 2 of matmul which for this case is the tensor(called matrix) not X.
Let's assume the following:
X is moved correctly to the GPU
The tensor declared in the Model class is a simple attribute.
i.e. Something like the following:
class Model(nn.Module):
def __init__(self):
super().__init__()
self.matrix = torch.randn(784, 10)
def forward(self, x):
return torch.matmul(x, self.matrix)
If so, your first attempt wouldn't work because the nn.Module.cuda() method only moves all of the Parameters and Buffers to the GPU.
You would need to make Model.matrix a Parameter instead of regular attribute.
Wrap it in the parameter class.
Something like:
self.matrix = nn.Parameter(torch.randn(784, 10))
Now, instead of automatically casting to the GPU like above, you tried to manually call the .cuda() method on Model.matrix within the override.
This doesn't work either because of a subtle difference between the nn.Module.cuda() method and the torch.Tensor.cuda() method.
While nn.Module.cuda() moves all the Parameters and Buffers of the Module to GPU and returns itself, torch.Tensor.cuda() only returns a copy of the tensor on the GPU.
The original tensor is unaffected.
In summary, either:
matrix attribute as a Parameter orself.matrix = self.matrix.cuda()
In your override.
I would suggest the first.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With