<code>nn.Module.cuda()</code> moves all model parameters and buffers to the GPU. But why not the model member tensor? <pre class="prettyprint"><code>class ToyModule(torch.nn.Module): def __init__(self) -> None: super(ToyModule, self).__init__() self.layer = torch.nn.Linear(2, 2) self.expected_moved_cuda_tensor = torch.tensor([0, 2, 3]) def forward(self, input: torch.Tensor) -> torch.Tensor: return self.layer(input) toy_module = ToyModule() toy_module.cuda() </code></pre> <pre class="prettyprint"><code>next(toy_module.layer.parameters()).device >>> device(type='cuda', index=0) </code></pre> for the model member tensor, the device stays unchanged. <pre class="prettyprint"><code>>>> toy_module.expected_moved_cuda_tensor.device device(type='cpu') </code></pre>

If you define a tensor inside the module it needs to be registered as either a parameter or a buffer so that the module is aware of it. <hr> Parameters are tensors that are to be trained and will be returned by <code>model.parameters()</code>. They are easy to register, all you need to do is wrap the tensor in the <code>nn.Parameter</code> type and it will be automatically registered. Note that only floating point tensors can be parameters. <pre class="prettyprint"><code>class ToyModule(torch.nn.Module): def __init__(self) -> None: super(ToyModule, self).__init__() self.layer = torch.nn.Linear(2, 2) # registering expected_moved_cuda_tensor as a trainable parameter self.expected_moved_cuda_tensor = torch.nn.Parameter(torch.tensor([0., 2., 3.])) def forward(self, input: torch.Tensor) -> torch.Tensor: return self.layer(input) </code></pre> <hr> Buffers are tensors that will be registered in the module so methods like <code>.cuda()</code> will affect them but they will not be returned by <code>model.parameters()</code>. Buffers are not restricted to a particular data type. <pre class="prettyprint"><code>class ToyModule(torch.nn.Module): def __init__(self) -> None: super(ToyModule, self).__init__() self.layer = torch.nn.Linear(2, 2) # registering expected_moved_cuda_tensor as a buffer # Note: this creates a new member variable named expected_moved_cuda_tensor self.register_buffer('expected_moved_cuda_tensor', torch.tensor([0, 2, 3]))) def forward(self, input: torch.Tensor) -> torch.Tensor: return self.layer(input) </code></pre> <hr> In both of the above cases the following code behaves the same <pre class="prettyprint"><code>>>> toy_module = ToyModule() >>> toy_module.cuda() >>> next(toy_module.layer.parameters()).device device(type='cuda', index=0) >>> toy_module.expected_moved_cuda_tensor.device device(type='cuda', index=0) </code></pre>

Why PyTorch nn.Module.cuda() not moving Module tensor but only parameters and buffers to GPU?

Tags:

python

gpu

tensor

pytorch

nn.Module.cuda() moves all model parameters and buffers to the GPU.

But why not the model member tensor?

class ToyModule(torch.nn.Module):
    def __init__(self) -> None:
        super(ToyModule, self).__init__()
        self.layer = torch.nn.Linear(2, 2)
        self.expected_moved_cuda_tensor = torch.tensor([0, 2, 3])

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        return self.layer(input)

toy_module = ToyModule()
toy_module.cuda()

next(toy_module.layer.parameters()).device
>>> device(type='cuda', index=0)

for the model member tensor, the device stays unchanged.

>>> toy_module.expected_moved_cuda_tensor.device
device(type='cpu')

367

asked Mar 29 '20 00:03

hsh

Video Answer

1 Answers

If you define a tensor inside the module it needs to be registered as either a parameter or a buffer so that the module is aware of it.

Parameters are tensors that are to be trained and will be returned by model.parameters(). They are easy to register, all you need to do is wrap the tensor in the nn.Parameter type and it will be automatically registered. Note that only floating point tensors can be parameters.

class ToyModule(torch.nn.Module):
    def __init__(self) -> None:
        super(ToyModule, self).__init__()
        self.layer = torch.nn.Linear(2, 2)
        # registering expected_moved_cuda_tensor as a trainable parameter
        self.expected_moved_cuda_tensor = torch.nn.Parameter(torch.tensor([0., 2., 3.]))

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        return self.layer(input)

Buffers are tensors that will be registered in the module so methods like .cuda() will affect them but they will not be returned by model.parameters(). Buffers are not restricted to a particular data type.

class ToyModule(torch.nn.Module):
    def __init__(self) -> None:
        super(ToyModule, self).__init__()
        self.layer = torch.nn.Linear(2, 2)
        # registering expected_moved_cuda_tensor as a buffer
        # Note: this creates a new member variable named expected_moved_cuda_tensor
        self.register_buffer('expected_moved_cuda_tensor', torch.tensor([0, 2, 3])))

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        return self.layer(input)

In both of the above cases the following code behaves the same

>>> toy_module = ToyModule()
>>> toy_module.cuda()
>>> next(toy_module.layer.parameters()).device
device(type='cuda', index=0)
>>> toy_module.expected_moved_cuda_tensor.device
device(type='cuda', index=0)

131

answered Oct 27 '22 08:10

jodag

Related questions
                            
                                Is it good to use asyncio.sleep() in long running code to divide async function to multiple smaller parts of code?
                            
                                How to mask image with binary mask
                            
                                Component Gateway with DataprocOperator on Airflow
                            
                                CNN Pytorch Error : Input type (torch.cuda.ByteTensor) and weight type (torch.cuda.FloatTensor) should be the same
                            
                                Python. Selenium. drag_and_drop error 'AttributeError: move_to requires a WebElement'
                            
                                Understanding Pytorch Grid Sample
                            
                                Handling PyLint Warning of Inconsistent Return Statement
                            
                                @tf.function ValueError: Creating variables on a non-first call to a function decorated with tf.function, unable to understand behaviour
                            
                                Decay parameter of Adam optimizer in Keras
                            
                                How do you edit an existing Tensorboard Training Loss summary?
                            
                                Converting python list to pytorch tensor
                            
                                What are these set operations, and why do they give different results?
                            
                                Black (Python) Ignore Rule
                            
                                melt columns and add 20 minutes to each row in date column
                            
                                Gensim LDA Coherence Score Nan
                            
                                cnn IndexError: Target 2 is out of bounds
                            
                                PySide2 Qt3D mesh does not show up
                            
                                pandas: get rows by comparing two columns of dataframe to list of tuples
                            
                                How to specify several marks for the pytest command
                            
                                How to convert a rgb image into a cmyk?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With