Does <code>Embedding</code> make similar words closer to each other? And do I just need to give to it all the sentences? Or it is just a lookup table and I need to code the model?

<code>nn.Embedding</code> holds a Tensor of dimension <code>(vocab_size, vector_size)</code>, i.e. of the size of the vocabulary x the dimension of each vector embedding, and a method that does the lookup. When you create an embedding layer, the Tensor is initialised randomly. It is only when you train it when this similarity between similar words should appear. Unless you have overwritten the values of the embedding with a previously trained model, like GloVe or Word2Vec, but that's another story. So, once you have the embedding layer defined, and the vocabulary defined and encoded (i.e. assign a unique number to each word in the vocabulary) you can use the instance of the nn.Embedding class to get the corresponding embedding. For example: <pre class="prettyprint"><code>import torch from torch import nn embedding = nn.Embedding(1000,128) embedding(torch.LongTensor([3,4])) </code></pre> will return the embedding vectors corresponding to the word 3 and 4 in your vocabulary. As no model has been trained, they will be random.

Embedding in pytorch

2 Answers

nn.Embedding holds a Tensor of dimension (vocab_size, vector_size), i.e. of the size of the vocabulary x the dimension of each vector embedding, and a method that does the lookup.

When you create an embedding layer, the Tensor is initialised randomly. It is only when you train it when this similarity between similar words should appear. Unless you have overwritten the values of the embedding with a previously trained model, like GloVe or Word2Vec, but that's another story.

So, once you have the embedding layer defined, and the vocabulary defined and encoded (i.e. assign a unique number to each word in the vocabulary) you can use the instance of the nn.Embedding class to get the corresponding embedding.

For example:

Click to copy

import torch
from torch import nn
embedding = nn.Embedding(1000,128)
embedding(torch.LongTensor([3,4]))

will return the embedding vectors corresponding to the word 3 and 4 in your vocabulary. As no model has been trained, they will be random.

197

answered Oct 17 '22 07:10

Escachator

You could treat nn.Embedding as a lookup table where the key is the word index and the value is the corresponding word vector. However, before using it you should specify the size of the lookup table, and initialize the word vectors yourself. Following is a code example demonstrating this.

Click to copy

import torch.nn as nn 

# vocab_size is the number of words in your train, val and test set
# vector_size is the dimension of the word vectors you are using
embed = nn.Embedding(vocab_size, vector_size)

# intialize the word vectors, pretrained_weights is a 
# numpy array of size (vocab_size, vector_size) and 
# pretrained_weights[i] retrieves the word vector of
# i-th word in the vocabulary
embed.weight.data.copy_(torch.fromnumpy(pretrained_weights))

# Then turn the word index into actual word vector
vocab = {"some": 0, "words": 1}
word_indexes = [vocab[w] for w in ["some", "words"]] 
word_vectors = embed(word_indexes)

answered Oct 17 '22 07:10

AveryLiu

Related questions
                            
                                Saving XML files using ElementTree
                            
                                How to merge two tuples in Python?
                            
                                RuntimeError: Invalid DISPLAY variable
                            
                                Plot an histogram with y-axis as percentage (using FuncFormatter?)
                            
                                Memory dump formatted like xxd from gdb
                            
                                shutil.rmtree() clarification
                            
                                Flask URL Route: Route Several URLs to the same function
                            
                                Make Pipenv create the virtualenv in the same folder
                            
                                How to move a tick label in matplotlib
                            
                                Apply GZIP compression to a CSV in Python Pandas
                            
                                How to override the pip command to Python3.x instead of Python2.7?
                            
                                Django template and the locals trick
                            
                                Launch IPython notebook with selected browser
                            
                                Flask Template Not found [duplicate]
                            
                                tqdm: 'module' object is not callable
                            
                                inserting newlines in xml file generated via xml.etree.ElementTree in python
                            
                                csv.writer writing each character of word in separate column/cell
                            
                                Get current user in Model Serializer
                            
                                How to print current date on python3?
                            
                                How to access the real value of a cell using the openpyxl module for python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Embedding in pytorch

Tags:

python

pytorch

word-embedding

user1927468

People also ask

2 Answers

Escachator

AveryLiu

Recent Activity

Donate For Us