Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between register_parameter and register_buffer in PyTorch?

Module's parameters get changed during training, that is, they are what is learnt during training of a neural network, but what is a buffer?

and is it learnt during neural network training?

like image 808
apostofes Avatar asked Aug 18 '19 00:08

apostofes


People also ask

What is model parameters () PyTorch?

model. parameters(): PyTorch modules have a a method called parameters() which returns an iterator over all the parameters. param.numel(): We use the Iterator object returned by the model.parameters() and calculate the number of elements in it using the .numel() function.

What does nn module mean?

nn modules. Any deep learning model is developed using the subclass of the torch. nn module it uses method like forward(input) which returns the output. A simple neural network takes input to add weights and bias to it feed the input through multiple hidden layers and finally returns the output.

What is State_dict in PyTorch?

A state_dict is an integral entity if you are interested in saving or loading models from PyTorch. Because state_dict objects are Python dictionaries, they can be easily saved, updated, altered, and restored, adding a great deal of modularity to PyTorch models and optimizers.

What is Self in PyTorch?

The self. modules() method returns an iterable to the many layers or “modules” defined in the model class. This particular piece of code is using that self. modules() iterable to initialize the weights of the different layers present in the model.

Why Register arguments as a buffer in PyTorch model?

Registering these "arguments" as the model's buffer allows pytorch to track them and save them like regular parameters, but prevents pytorch from updating them using SGD mechanism.

What is the difference between a register and a buffer?

The main difference between register and buffer is that the register is a temporary storage area in the processor that allows transferring data faster while the buffer is a temporary storage area in the main memory that holds data before using them. A register is a fast memory location built into the processor.

What is the difference between a parameter and a buffer?

When you register a new parameter it will appear inside the module.parameters () iterator, but when you register a buffer it will not. Buffers are named tensors that do not update gradients at every step, like parameters.

What does register_parameter () do in Python?

Both parameters and buffers you create for a module ( nn.Module ). Say you have a linear layer nn.Linear. You already have weight and bias parameters. But if you need a new parameter you use register_parameter () to register a new named parameter that is a tensor.


2 Answers

Pytorch doc for register_buffer() method reads

This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s running_mean is not a parameter, but is part of the persistent state.

As you already observed, model parameters are learned and updated using SGD during the training process.
However, sometimes there are other quantities that are part of a model's "state" and should be
- saved as part of state_dict.
- moved to cuda() or cpu() with the rest of the model's parameters.
- cast to float/half/double with the rest of the model's parameters.
Registering these "arguments" as the model's buffer allows pytorch to track them and save them like regular parameters, but prevents pytorch from updating them using SGD mechanism.

An example for a buffer can be found in _BatchNorm module where the running_mean , running_var and num_batches_tracked are registered as buffers and updated by accumulating statistics of data forwarded through the layer. This is in contrast to weight and bias parameters that learns an affine transformation of the data using regular SGD optimization.

like image 82
Shai Avatar answered Oct 16 '22 10:10

Shai


Both parameters and buffers you create for a module (nn.Module).

Say you have a linear layer nn.Linear. You already have weight and bias parameters. But if you need a new parameter you use register_parameter() to register a new named parameter that is a tensor.

When you register a new parameter it will appear inside the module.parameters() iterator, but when you register a buffer it will not.

The difference:

Buffers are named tensors that do not update gradients at every step, like parameters. For buffers, you create your custom logic (fully up to you).

The good thing is when you save the model, all params and buffers are saved, and when you move the model to or off the CUDA params and buffers will go as well.

like image 33
prosti Avatar answered Oct 16 '22 08:10

prosti