Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get word2index from gensim

Tags:

gensim

By doc we can use this to read a word2vec model with genism

model = KeyedVectors.load_word2vec_format('word2vec.50d.txt', binary=False)

This is an index-to-word mapping, that is, e.g., model.index2word[2], how to derive an inverted mapping (word-to-index) based on this?

like image 771
GabrielChu Avatar asked Nov 05 '17 02:11

GabrielChu


People also ask

How do I validate a Word2Vec model?

To assess which word2vec model is best, simply calculate the distance for each pair, do it 200 times, sum up the total distance, and the smallest total distance will be your best model.

What does Gensim Word2Vec do?

Word2Vec is a widely used word representation technique that uses neural networks under the hood. The resulting word representation or embeddings can be used to infer semantic similarity between words and phrases, expand queries, surface related concepts and more.

How do I install Gensim Word2Vec model?

This saved model can be loaded again using load() , which supports online training and getting vectors for vocabulary words. fname (str) – Path to the file. Score the log probability for a sequence of sentences. This does not change the fitted model in any way (see train() for that).


2 Answers

The mappings from word-to-index are in the KeyedVectors vocab property, a dictionary with objects that include an index property.

For example:

word = "whatever"  # for any word in model
i = model.vocab[word].index
model.index2word[i] == word  # will be true
like image 62
gojomo Avatar answered Nov 15 '22 17:11

gojomo


Even simpler solution would be to enumerate index2word

word2index = {token: token_index for token_index, token in enumerate(w2v.index2word)} 
word2index['hi'] == 30308  # True
like image 44
Alex Parakhnevich Avatar answered Nov 15 '22 18:11

Alex Parakhnevich