Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What "exactly" happens inside embedding layer in pytorch?

Tags:

From multiple searches and pytorch documentation itself I could figure out that inside embedding layer there is a lookup table where the embedding vectors are stored. What I am not able to understand:

  1. what exactly happens during training in this layer?
  2. What are the weights and how the gradients of those weights are computed?
  3. My intuition is that at least there should be a function with some parameters that produces the keys for the lookup table. If so, then what is that function?

Any help in this will be appreciated. Thanks.

like image 214
Rituraj Kaushik Avatar asked Nov 05 '19 20:11

Rituraj Kaushik


People also ask

How does the embedding layer in PyTorch work?

Uses of PyTorch Embedding We can say that the embedding layer works like a lookup table where each word are converted to numbers, and these numbers can be used to make up the table. Thus, keys are represented by words, and the values are word vectors.

What does the embedding layer do?

The Embedding layer takes the integer-encoded vocabulary and looks up the embedding vector for each word-index. These vectors are learned as the model trains. The vectors add a dimension to the output array. The resulting dimensions are: (batch, sequence, embedding) .

What is embedding bag in PyTorch?

Computes sums or means of 'bags' of embeddings, without instantiating the intermediate embeddings. with mode="sum" is equivalent to Embedding followed by torch.

How does the embedding layer in Keras work?

What is the embedding layer in Keras? Keras provides an embedding layer that converts each word into a fixed-length vector of defined size. The one-hot-encoding technique generates a large sparse matrix to represent a single word, whereas, in embedding layers, every word has a real-valued vector of fixed length.


1 Answers

That is a really good question! The embedding layer of PyTorch (same goes for Tensorflow) serves as a lookup table just to retrieve the embeddings for each of the inputs, which are indices. Consider the following case, you have a sentence where each word is tokenized. Therefore, each word in your sentence is represented with a unique integer (index). In case the list of indices (words) is [1, 5, 9], and you want to encode each of the words with a 50 dimensional vector (embedding), you can do the following:

# The list of tokens
tokens = torch.tensor([0,5,9], dtype=torch.long)
# Define an embedding layer, where you know upfront that in total you
# have 10 distinct words, and you want each word to be encoded with
# a 50 dimensional vector
embedding = torch.nn.Embedding(num_embeddings=10, embedding_dim=50)
# Obtain the embeddings for each of the words in the sentence
embedded_words = embedding(tokens)

Now, to answer your questions:

  1. During the forward pass, the values for each of the tokens in your sentence are going to be obtained in a similar way as the Numpy's indexing works. Because in the backend, this is a differentiable operation, during the backward pass (training), Pytorch is going to compute the gradients for each of the embeddings and readjust them accordingly.

  2. The weights are the embeddings themselves. The word embedding matrix is actually a weight matrix that will be learned during training.

  3. There is no actual function per se. As we defined above, the sentence is already tokenized (each word is represented with a unique integer), and we can just obtain the embeddings for each of the tokens in the sentence.

Finally, as I mentioned the example with the indexing many times, let us try it out.

# Let us assume that we have a pre-trained embedding matrix
pretrained_embeddings = torch.rand(10, 50)
# We can initialize our embedding module from the embedding matrix
embedding = torch.nn.Embedding.from_pretrained(pretrained_embeddings)
# Some tokens
tokens = torch.tensor([1,5,9], dtype=torch.long)

# Token embeddings from the lookup table
lookup_embeddings = embedding(tokens)
# Token embeddings obtained with indexing
indexing_embeddings = pretrained_embeddings[tokens]
# Voila! They are the same
np.testing.assert_array_equal(lookup_embeddings.numpy(), indexing_embeddings.numpy())
like image 91
gorjan Avatar answered Sep 17 '22 10:09

gorjan