Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why PyTorch nn.Module.cuda() not moving Module tensor but only parameters and buffers to GPU?

nn.Module.cuda() moves all model parameters and buffers to the GPU.

But why not the model member tensor?

class ToyModule(torch.nn.Module):
    def __init__(self) -> None:
        super(ToyModule, self).__init__()
        self.layer = torch.nn.Linear(2, 2)
        self.expected_moved_cuda_tensor = torch.tensor([0, 2, 3])

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        return self.layer(input)

toy_module = ToyModule()
toy_module.cuda()
next(toy_module.layer.parameters()).device
>>> device(type='cuda', index=0)

for the model member tensor, the device stays unchanged.

>>> toy_module.expected_moved_cuda_tensor.device
device(type='cpu')
like image 367
hsh Avatar asked Mar 29 '20 00:03

hsh


People also ask

What is model parameters () PyTorch?

model. parameters(): PyTorch modules have a a method called parameters() which returns an iterator over all the parameters. param.numel(): We use the Iterator object returned by the model.parameters() and calculate the number of elements in it using the .numel() function.

What does Cuda () do PyTorch?

cuda is used to set up and run CUDA operations. It keeps track of the currently selected GPU, and all CUDA tensors you allocate will by default be created on that device. The selected device can be changed with a torch. cuda.

Does PyTorch use GPU by default?

By default, within PyTorch, you cannot use cross-GPU operations. The exception is the use of copy_() or copy-like methods, such as to() and cuda(). To launch operations across distributed tensors, you must first enable peer-to-peer memory access.

Is the tensor stored on the GPU?

We can now check if the tensor is stored on the GPU: As expected — by default data won’t be stored on GPU, but it’s fairly easy to move it there: Neat. The same sanity check can be performed again, and this time we know that the tensor was moved to the GPU: Great, but what about model declaration? I’m glad you’ve asked.

Can PyTorch be used on GPU?

PyTorch: Switching to the GPU. How and Why to train models on the GPU… | by Dario Radečić | Towards Data Science How and Why to train models on the GPU — Code Included. Unlike TensorFlow, PyTorch doesn’t have a dedicated library for GPU users, and as a developer, you’ll need to do some manual work here.

Why is the second argument of matmul on GPU?

Indicating that the second argument of matmul (the instance tensor i declared) is on CPU and it was expected on GPU (as the rest of the model and data). In matmul, the tensor is transposed via matrix.t ()

What is the difference between torch device and tensor?

device ( torch.device) – the desired device of the parameters and buffers in this module dtype ( torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this module tensor ( torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module


Video Answer


1 Answers

If you define a tensor inside the module it needs to be registered as either a parameter or a buffer so that the module is aware of it.


Parameters are tensors that are to be trained and will be returned by model.parameters(). They are easy to register, all you need to do is wrap the tensor in the nn.Parameter type and it will be automatically registered. Note that only floating point tensors can be parameters.

class ToyModule(torch.nn.Module):
    def __init__(self) -> None:
        super(ToyModule, self).__init__()
        self.layer = torch.nn.Linear(2, 2)
        # registering expected_moved_cuda_tensor as a trainable parameter
        self.expected_moved_cuda_tensor = torch.nn.Parameter(torch.tensor([0., 2., 3.]))

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        return self.layer(input)

Buffers are tensors that will be registered in the module so methods like .cuda() will affect them but they will not be returned by model.parameters(). Buffers are not restricted to a particular data type.

class ToyModule(torch.nn.Module):
    def __init__(self) -> None:
        super(ToyModule, self).__init__()
        self.layer = torch.nn.Linear(2, 2)
        # registering expected_moved_cuda_tensor as a buffer
        # Note: this creates a new member variable named expected_moved_cuda_tensor
        self.register_buffer('expected_moved_cuda_tensor', torch.tensor([0, 2, 3])))

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        return self.layer(input)

In both of the above cases the following code behaves the same

>>> toy_module = ToyModule()
>>> toy_module.cuda()
>>> next(toy_module.layer.parameters()).device
device(type='cuda', index=0)
>>> toy_module.expected_moved_cuda_tensor.device
device(type='cuda', index=0)
like image 131
jodag Avatar answered Oct 27 '22 08:10

jodag