I'm working on a project where the model requires access to a tensor that i declare in the constructor init of the class (im sub-classing torch.nn.Module class) and then i need to use this tensor in the forward() method via a simple matmul() , the model is sent to gpu via a cuda() call:
model = Model()
model.cuda()
However when i do forward-propagation of a simple input X through:
model(X) # or model.forward(X)
I get
RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #2 'mat2'
Indicating that the second argument of matmul(the instance tensor i declared) is on CPU and it was expected on GPU (as the rest of the model and data).
In matmul, the tensor is transposed via matrix.t()
I even tried overriding the cuda() method thorugh:
def cuda(self):
super().cuda()
self.matrix.cuda()
The data is already in the GPU ,meaning the following line of code was already executed:
X = X.cuda()
Also the error explcitly says argument 2 of matmul which for this case is the tensor(called matrix) not X.
Let's assume the following:
X
is moved correctly to the GPU
The tensor declared in the Model
class is a simple attribute.
i.e. Something like the following:
class Model(nn.Module):
def __init__(self):
super().__init__()
self.matrix = torch.randn(784, 10)
def forward(self, x):
return torch.matmul(x, self.matrix)
If so, your first attempt wouldn't work because the nn.Module.cuda()
method only moves all of the Parameters
and Buffers
to the GPU.
You would need to make Model.matrix
a Parameter
instead of regular attribute.
Wrap it in the parameter class.
Something like:
self.matrix = nn.Parameter(torch.randn(784, 10))
Now, instead of automatically casting to the GPU like above, you tried to manually call the .cuda()
method on Model.matrix
within the override.
This doesn't work either because of a subtle difference between the nn.Module.cuda()
method and the torch.Tensor.cuda()
method.
While nn.Module.cuda()
moves all the Parameters
and Buffers
of the Module
to GPU and returns itself, torch.Tensor.cuda()
only returns a copy of the tensor on the GPU.
The original tensor is unaffected.
In summary, either:
matrix
attribute as a Parameter
orself.matrix = self.matrix.cuda()
In your override.
I would suggest the first.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With