There seem to be two ways of initializing embedding layers in Pytorch 1.0 using an uniform distribution.
For example you have an embedding layer:
self.in_embed = nn.Embedding(n_vocab, n_embed)
And you want to initialize its weights with an uniform distribution. The first way you can get this done is:
self.in_embed.weight.data.uniform_(-1, 1)
And another one would be:
nn.init.uniform_(self.in_embed.weight, -1.0, 1.0)
My question is: what is the difference between the first and second initialization form. Do both methods do the same thing?
Initializing Weights To Zero In PyTorch With Class Functions One of the most popular way to initialize weights is to use a class function that we can invoke at the end of the __init__ function in a custom PyTorch model.
PyTorch has inbuilt weight initialization which works quite well so you wouldn't have to worry about it but. You can check the default initialization of the Conv layer and Linear layer.
PyTorch Embedding is a space with low dimensions where high dimensional vectors can be translated easily so that models can be reused on new problems and can be solved easily. The changes are kept to each single video frame so that the data can be hidden easily in the video frames whenever there are any changes.
There are two standard methods for weight initialization of layers with non-linear activation- The Xavier(Glorot) initialization and the Kaiming initialization. We will not dive into the mathematical expression and proofs but focus more on where to use them and how to apply them.
Both are same
torch.manual_seed(3)
emb1 = nn.Embedding(5,5)
emb1.weight.data.uniform_(-1, 1)
torch.manual_seed(3)
emb2 = nn.Embedding(5,5)
nn.init.uniform_(emb2.weight, -1.0, 1.0)
assert torch.sum(torch.abs(emb1.weight.data - emb2.weight.data)).numpy() == 0
Every tensor has a uniform_
method which initializes it with the values from the uniform distribution. Also, the nn.init
module has a method uniform_
which takes in a tensor and inits it with values from uniform distribution. Both are same expect first one is using the member function and the second is using a general utility function.
According to my knowledge both forms are identical in effect as @mujjiga answers.
In general my preference goes towards the second option because:
You have to access .data
attribute in the manual case.
Using torch.nn.init
is more explicit and readable (a little subjective)
Allows others to modify your source code easier (if they were to change initialization scheme to, say, xavier_uniform
, only the name would have to change).
Little offtopic: TBH, I think torch.nn.init
should be callable on the layer itself as it would help initialize torch.nn.Sequential
models using simple model.apply(torch.nn.init.xavier_uniform_)
. Furthermore, it might be beneficial for it to initialize bias
Tensor as well (or use an appropriate argument) for it, but it is what it is.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With