I'm learning pytorch and I'm wondering what does the <code>padding_idx</code> attribute do in <code>torch.nn.Embedding(n1, d1, padding_idx=0)</code>? I have looked everywhere and couldn't find something I can get. Can you show example to illustrate this?

As per the docs, <code>padding_idx</code> pads the output with the embedding vector at <code>padding_idx</code> (initialized to zeros) whenever it encounters the index. What this means is that wherever you have an item equal to <code>padding_idx</code>, the output of the embedding layer at that index will be all zeros. Here is an example: Let us say you have word embeddings of 1000 words, each 50-dimensional ie <code>num_embeddingss=1000</code>, <code>embedding_dim=50</code>. Then <code>torch.nn.Embedding</code> works like a lookup table (lookup table is trainable though): <pre class="prettyprint"><code>emb_layer = torch.nn.Embedding(1000,50) x = torch.LongTensor([[1,2,4,5],[4,3,2,9]]) y = emb_layer(x) </code></pre> <code>y</code> will be a tensor of shape 2x4x50. I hope this part is clear to you. Now if I specify <code>padding_idx=2</code>, ie <pre class="prettyprint"><code>emb_layer = torch.nn.Embedding(1000,50, padding_idx=2) x = torch.LongTensor([[1,2,4,5],[4,3,2,9]]) y = emb_layer(x) </code></pre> then output will still be 2x4x50 but the 50-dim vector at (1,2) and (2,3) will be all zeros since <code>x[1,2]</code> and <code>x[2,3]</code> values are 2 which is equal to the <code>padding_idx</code>. You can think of it as 3rd word in the lookup table (since lookup table would be 0-indexed) is not being used for training.

what does padding_idx do in nn.embeddings()

Tags:

python

deep-learning

nlp

pytorch

recurrent-neural-network

I'm learning pytorch and I'm wondering what does the padding_idx attribute do in torch.nn.Embedding(n1, d1, padding_idx=0)? I have looked everywhere and couldn't find something I can get. Can you show example to illustrate this?

276

asked Apr 12 '20 13:04

user42493

Video Answer

2 Answers

padding_idx is indeed quite badly described in the documentation.

Basically, it specifies which index passed during call will mean "zero vector" (which is quite often used in NLP in case some token is missing). By default no index will mean "zero vector", as you can see in the example below:

Click to copy

import torch

embedding = torch.nn.Embedding(10, 3)
input = torch.LongTensor([[0, 1, 0, 5]])
print(embedding(input))

Will give you:

Click to copy

tensor([[[ 0.1280, -1.1390, -2.5007],
         [ 0.3617, -0.9280,  1.2894],
         [ 0.1280, -1.1390, -2.5007],
         [-1.3135, -0.0229,  0.2451]]], grad_fn=<EmbeddingBackward>)

If you specify padding_idx=0 every input where the value is equal to 0 (so zero-th and second row) will be zero-ed out like this (code: embedding = torch.nn.Embedding(10, 3, padding_idx=0)):

Click to copy

tensor([[[ 0.0000,  0.0000,  0.0000],
         [-0.4448, -0.2076,  1.1575],
         [ 0.0000,  0.0000,  0.0000],
         [ 1.3602, -0.6299, -0.5809]]], grad_fn=<EmbeddingBackward>

If you were to specify padding_idx=5 last row would be full of zeros etc.

answered Sep 23 '22 22:09

Szymon Maszke

As per the docs, padding_idx pads the output with the embedding vector at padding_idx (initialized to zeros) whenever it encounters the index.

What this means is that wherever you have an item equal to padding_idx, the output of the embedding layer at that index will be all zeros.

Here is an example: Let us say you have word embeddings of 1000 words, each 50-dimensional ie num_embeddingss=1000, embedding_dim=50. Then torch.nn.Embedding works like a lookup table (lookup table is trainable though):

Click to copy

emb_layer = torch.nn.Embedding(1000,50)
x = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
y = emb_layer(x)

y will be a tensor of shape 2x4x50. I hope this part is clear to you.

Now if I specify padding_idx=2, ie

Click to copy

emb_layer = torch.nn.Embedding(1000,50, padding_idx=2)
x = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
y = emb_layer(x)

then output will still be 2x4x50 but the 50-dim vector at (1,2) and (2,3) will be all zeros since x[1,2] and x[2,3] values are 2 which is equal to the padding_idx. You can think of it as 3rd word in the lookup table (since lookup table would be 0-indexed) is not being used for training.

answered Sep 21 '22 22:09

Piyush Singh

Related questions
                            
                                Access last element of this python panda Series
                            
                                Convert String to List of Dictionaries Python 3
                            
                                Keras load_model returning Unexpected keyword argument passed to optimizer: amsgrad
                            
                                How to use `cv2.findContours` in different OpenCV versions?
                            
                                Is there an efficient way to create a random bit mask in Pytorch?
                            
                                Suppress key addition in collections.defaultdict
                            
                                Flask-socketio - failed to set "Access-Control-Allow-Origin" response header
                            
                                row_to_json and psycopg2.fetchall() results are lists within a list instead of dictionaries in a list
                            
                                Difference or Relation between RASA and Spacy
                            
                                Creating custom dictionary from two lists
                            
                                Sort column in Pandas DataFrame by specific order
                            
                                Could not install pycocotools in windows: fatal error C1083: Cannot open include file: 'io.h': No such file or directory error:
                            
                                Count instances of strings in multiple columns python
                            
                                DynamodDB: How to update sort key?
                            
                                How to perform SMOTE with cross validation in sklearn in python
                            
                                Pandas DataFrames: Create new rows with calculations across existing rows
                            
                                PyTorch DataLoader - "IndexError: too many indices for tensor of dimension 0"
                            
                                Django is synchronous or asynchronous?
                            
                                pd.to_datetime producing "Reindexing only valid with uniquely valued Index objects"
                            
                                cv2.rectangle() calls overloaded method, although I give other parameter

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

what does padding_idx do in nn.embeddings()

Tags:

python

deep-learning

nlp

pytorch

recurrent-neural-network

user42493

People also ask

Video Answer

2 Answers

Szymon Maszke

Piyush Singh

Recent Activity

Donate For Us