From multiple searches and pytorch documentation itself I could figure out that inside embedding layer there is a lookup table where the embedding vectors are stored. What I am not able to understand: <ol> <li>what exactly happens during training in this layer?</li> <li>What are the weights and how the gradients of those weights are computed? </li> <li>My intuition is that at least there should be a function with some parameters that produces the keys for the lookup table. If so, then what is that function?</li> </ol> Any help in this will be appreciated. Thanks.

That is a really good question! The embedding layer of PyTorch (same goes for Tensorflow) serves as a lookup table just to retrieve the embeddings for each of the inputs, which are indices. Consider the following case, you have a sentence where each word is tokenized. Therefore, each word in your sentence is represented with a unique integer (index). In case the list of indices (words) is <code>[1, 5, 9]</code>, and you want to encode each of the words with a <code>50</code> dimensional vector (embedding), you can do the following: <pre class="prettyprint lang-py prettyprint-override"><code># The list of tokens tokens = torch.tensor([0,5,9], dtype=torch.long) # Define an embedding layer, where you know upfront that in total you # have 10 distinct words, and you want each word to be encoded with # a 50 dimensional vector embedding = torch.nn.Embedding(num_embeddings=10, embedding_dim=50) # Obtain the embeddings for each of the words in the sentence embedded_words = embedding(tokens) </code></pre> Now, to answer your questions: <ol> <li> During the forward pass, the values for each of the tokens in your sentence are going to be obtained in a similar way as the Numpy's indexing works. Because in the backend, this is a differentiable operation, during the backward pass (training), Pytorch is going to compute the gradients for each of the embeddings and readjust them accordingly. </li> <li> The weights are the embeddings themselves. The word embedding matrix is actually a weight matrix that will be learned during training. </li> <li> There is no actual function per se. As we defined above, the sentence is already tokenized (each word is represented with a unique integer), and we can just obtain the embeddings for each of the tokens in the sentence. </li> </ol> Finally, as I mentioned the example with the indexing many times, let us try it out. <pre class="prettyprint lang-py prettyprint-override"><code># Let us assume that we have a pre-trained embedding matrix pretrained_embeddings = torch.rand(10, 50) # We can initialize our embedding module from the embedding matrix embedding = torch.nn.Embedding.from_pretrained(pretrained_embeddings) # Some tokens tokens = torch.tensor([1,5,9], dtype=torch.long) # Token embeddings from the lookup table lookup_embeddings = embedding(tokens) # Token embeddings obtained with indexing indexing_embeddings = pretrained_embeddings[tokens] # Voila! They are the same np.testing.assert_array_equal(lookup_embeddings.numpy(), indexing_embeddings.numpy()) </code></pre>

What "exactly" happens inside embedding layer in pytorch?

Tags:

From multiple searches and pytorch documentation itself I could figure out that inside embedding layer there is a lookup table where the embedding vectors are stored. What I am not able to understand:

what exactly happens during training in this layer?
What are the weights and how the gradients of those weights are computed?
My intuition is that at least there should be a function with some parameters that produces the keys for the lookup table. If so, then what is that function?

Any help in this will be appreciated. Thanks.

214

asked Nov 05 '19 20:11

Rituraj Kaushik

1 Answers

That is a really good question! The embedding layer of PyTorch (same goes for Tensorflow) serves as a lookup table just to retrieve the embeddings for each of the inputs, which are indices. Consider the following case, you have a sentence where each word is tokenized. Therefore, each word in your sentence is represented with a unique integer (index). In case the list of indices (words) is [1, 5, 9], and you want to encode each of the words with a 50 dimensional vector (embedding), you can do the following:

# The list of tokens
tokens = torch.tensor([0,5,9], dtype=torch.long)
# Define an embedding layer, where you know upfront that in total you
# have 10 distinct words, and you want each word to be encoded with
# a 50 dimensional vector
embedding = torch.nn.Embedding(num_embeddings=10, embedding_dim=50)
# Obtain the embeddings for each of the words in the sentence
embedded_words = embedding(tokens)

Now, to answer your questions:

During the forward pass, the values for each of the tokens in your sentence are going to be obtained in a similar way as the Numpy's indexing works. Because in the backend, this is a differentiable operation, during the backward pass (training), Pytorch is going to compute the gradients for each of the embeddings and readjust them accordingly.
The weights are the embeddings themselves. The word embedding matrix is actually a weight matrix that will be learned during training.
There is no actual function per se. As we defined above, the sentence is already tokenized (each word is represented with a unique integer), and we can just obtain the embeddings for each of the tokens in the sentence.

Finally, as I mentioned the example with the indexing many times, let us try it out.

# Let us assume that we have a pre-trained embedding matrix
pretrained_embeddings = torch.rand(10, 50)
# We can initialize our embedding module from the embedding matrix
embedding = torch.nn.Embedding.from_pretrained(pretrained_embeddings)
# Some tokens
tokens = torch.tensor([1,5,9], dtype=torch.long)

# Token embeddings from the lookup table
lookup_embeddings = embedding(tokens)
# Token embeddings obtained with indexing
indexing_embeddings = pretrained_embeddings[tokens]
# Voila! They are the same
np.testing.assert_array_equal(lookup_embeddings.numpy(), indexing_embeddings.numpy())

answered Sep 17 '22 10:09

gorjan

Related questions
                            
                                What is the state of C++ refactor support in Eclipse?
                            
                                How to embed Ruby in C++?
                            
                                Is it worth the trouble to use tinyint instead of int for SqlServer lookup tables?
                            
                                PHP library for openID [closed]
                            
                                CSS - Separation of Color and Position
                            
                                How can I distinguish $_ in nested list operators in Perl?
                            
                                How can I debug a regular expression in Python?
                            
                                A workaround for Django QueryDict wrapping values in lists?
                            
                                How do I write the qualified name of a symbol in Haskell?
                            
                                PostgreSQL function returning multiple result sets
                            
                                getBytes vs getBinaryStream vs getBlob for getting data out of a BLOB column
                            
                                What's the main difference between signcode.exe and signtool.exe?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With