pytorch embedding index out of range

Question

I'm following this tutorial here https://cs230-stanford.github.io/pytorch-nlp.html. In there a neural model is created, using nn.Module, with an embedding layer, which is initialized here

self.embedding = nn.Embedding(params['vocab_size'], params['embedding_dim'])

vocab_size is the total number of training samples, which is 4000. embedding_dim is 50. The relevant piece of the forward method is below

def forward(self, s):
        # apply the embedding layer that maps each token to its embedding
        s = self.embedding(s)   # dim: batch_size x batch_max_len x embedding_dim

I get this exception when passing a batch to the model like so model(train_batch) train_batch is a numpy array of dimension batch_sizexbatch_max_len. Each sample is a sentence, and each sentence is padded so that it has the length of the longest sentence in the batch.

File "/Users/liam_adams/Documents/cs512/research_project/custom/model.py", line 34, in forward s = self.embedding(s) # dim: batch_size x batch_max_len x embedding_dim File "/Users/liam_adams/Documents/cs512/venv_research/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/Users/liam_adams/Documents/cs512/venv_research/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 117, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "/Users/liam_adams/Documents/cs512/venv_research/lib/python3.7/site-packages/torch/nn/functional.py", line 1506, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: index out of range at ../aten/src/TH/generic/THTensorEvenMoreMath.cpp:193

Is the problem here that the embedding is initialized with different dimensions than those of my batch array? My batch_size will be constant but batch_max_len will change with every batch. This is how its done in the tutorial.

gary69 · Accepted Answer

Found the answer here https://discuss.pytorch.org/t/embeddings-index-out-of-range-error/12582

I'm converting words to indexes, but I had the indexes based off the total number of words, not vocab_size which is a smaller set of the most frequent words.

kmario23 · Answer

You've got some things wrong. Please correct those and re-run your code:

params['vocab_size'] is the total number of unique tokens. So, it should be len(vocab) in the tutorial.
params['embedding_dim'] can be 50 or 100 or whatever you choose. Most folks would use something in the range [50, 1000] both extremes inclusive. Both Word2Vec and GloVe uses 300 dimensional embeddings for the words.
self.embedding() would accept arbitrary batch size. So, it doesn't matter. BTW, in the tutorial the commented things such as # dim: batch_size x batch_max_len x embedding_dim indicates the shape of output tensor of that specific operation, not the inputs.

pytorch embedding index out of range

Tags:

python

neural-network

nlp

pytorch

recurrent-neural-network

gary69

2 Answers

gary69

kmario23

Recent Activity

Donate For Us

pytorch embedding index out of range

Tags:

python

neural-network

nlp

pytorch

recurrent-neural-network

gary69

2 Answers

gary69

kmario23

Related questions

Recent Activity

Donate For Us