Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different methods for initializing embedding layer weights in Pytorch

Tags:

python

pytorch

There seem to be two ways of initializing embedding layers in Pytorch 1.0 using an uniform distribution.

For example you have an embedding layer:

self.in_embed = nn.Embedding(n_vocab, n_embed)

And you want to initialize its weights with an uniform distribution. The first way you can get this done is:

self.in_embed.weight.data.uniform_(-1, 1)

And another one would be:

nn.init.uniform_(self.in_embed.weight, -1.0, 1.0)

My question is: what is the difference between the first and second initialization form. Do both methods do the same thing?

like image 864
gil.fernandes Avatar asked Mar 21 '19 08:03

gil.fernandes


People also ask

How do you initialize the weights of a layer in PyTorch?

Initializing Weights To Zero In PyTorch With Class Functions One of the most popular way to initialize weights is to use a class function that we can invoke at the end of the __init__ function in a custom PyTorch model.

Does PyTorch automatically initialize weights?

PyTorch has inbuilt weight initialization which works quite well so you wouldn't have to worry about it but. You can check the default initialization of the Conv layer and Linear layer.

What is an embedding layer in PyTorch?

PyTorch Embedding is a space with low dimensions where high dimensional vectors can be translated easily so that models can be reused on new problems and can be solved easily. The changes are kept to each single video frame so that the data can be hidden easily in the video frames whenever there are any changes.

What initialization does PyTorch use?

There are two standard methods for weight initialization of layers with non-linear activation- The Xavier(Glorot) initialization and the Kaiming initialization. We will not dive into the mathematical expression and proofs but focus more on where to use them and how to apply them.


2 Answers

Both are same

torch.manual_seed(3)
emb1 = nn.Embedding(5,5)
emb1.weight.data.uniform_(-1, 1)

torch.manual_seed(3)
emb2 = nn.Embedding(5,5)
nn.init.uniform_(emb2.weight, -1.0, 1.0)

assert torch.sum(torch.abs(emb1.weight.data - emb2.weight.data)).numpy() == 0

Every tensor has a uniform_ method which initializes it with the values from the uniform distribution. Also, the nn.init module has a method uniform_ which takes in a tensor and inits it with values from uniform distribution. Both are same expect first one is using the member function and the second is using a general utility function.

like image 122
mujjiga Avatar answered Oct 18 '22 19:10

mujjiga


According to my knowledge both forms are identical in effect as @mujjiga answers.

In general my preference goes towards the second option because:

  1. You have to access .data attribute in the manual case.

  2. Using torch.nn.init is more explicit and readable (a little subjective)

  3. Allows others to modify your source code easier (if they were to change initialization scheme to, say, xavier_uniform, only the name would have to change).

Little offtopic: TBH, I think torch.nn.init should be callable on the layer itself as it would help initialize torch.nn.Sequential models using simple model.apply(torch.nn.init.xavier_uniform_). Furthermore, it might be beneficial for it to initialize bias Tensor as well (or use an appropriate argument) for it, but it is what it is.

like image 4
Szymon Maszke Avatar answered Oct 18 '22 17:10

Szymon Maszke