By doc we can use this to read a word2vec model with genism <pre class="prettyprint"><code>model = KeyedVectors.load_word2vec_format('word2vec.50d.txt', binary=False) </code></pre> This is an index-to-word mapping, that is, e.g., <code>model.index2word[2]</code>, how to derive an inverted mapping (word-to-index) based on this?

The mappings from word-to-index are in the <code>KeyedVectors</code> <code>vocab</code> property, a dictionary with objects that include an <code>index</code> property. For example: <pre class="prettyprint"><code>word = "whatever" # for any word in model i = model.vocab[word].index model.index2word[i] == word # will be true </code></pre>

Even simpler solution would be to enumerate <code>index2word</code> <pre class="prettyprint"><code>word2index = {token: token_index for token_index, token in enumerate(w2v.index2word)} word2index['hi'] == 30308 # True </code></pre>

How to get word2index from gensim

Tags:

gensim

By doc we can use this to read a word2vec model with genism

model = KeyedVectors.load_word2vec_format('word2vec.50d.txt', binary=False)

This is an index-to-word mapping, that is, e.g., model.index2word[2], how to derive an inverted mapping (word-to-index) based on this?

771

asked Nov 05 '17 02:11

GabrielChu

2 Answers

The mappings from word-to-index are in the KeyedVectors vocab property, a dictionary with objects that include an index property.

For example:

word = "whatever"  # for any word in model
i = model.vocab[word].index
model.index2word[i] == word  # will be true

answered Nov 15 '22 17:11

gojomo

Even simpler solution would be to enumerate index2word

word2index = {token: token_index for token_index, token in enumerate(w2v.index2word)} 
word2index['hi'] == 30308  # True

answered Nov 15 '22 18:11

Alex Parakhnevich

Related questions
                            
                                How to do Text classification using word2vec
                            
                                Error while loading Word2Vec model in gensim
                            
                                Understanding parameters in Gensim LDA Model
                            
                                How to use the infer_vector in gensim.doc2vec?
                            
                                Understanding LDA / topic modelling -- too much topic overlap
                            
                                Necessary to apply TF-IDF to new documents in gensim LDA model?
                            
                                No module named pyLDAvis
                            
                                Doc2Vec and PySpark: Gensim Doc2vec over DeepDist
                            
                                Gensim LDA topic assignment
                            
                                How to obtain antonyms through word2vec?
                            
                                Is it possible to re-train a word2vec model (e.g. GoogleNews-vectors-negative300.bin) from a corpus of sentences in python?
                            
                                Gensim word2vec on predefined dictionary and word-indices data
                            
                                How does the Gensim Fasttext pre-trained model get vectors for out-of-vocabulary words?
                            
                                Getting "__init__() got an unexpected keyword argument 'document'" this error in python I'm working with Word2Vec and gensim
                            
                                How to properly use get_keras_embedding() in Gensim’s Word2Vec?
                            
                                Is there pre-trained doc2vec model?
                            
                                Doc2Vec.infer_vector keeps giving different result everytime on a particular trained model
                            
                                Gensim 3.8.0 to Gensim 4.0.0
                            
                                How to load a pre-trained Word2vec MODEL File and reuse it?
                            
                                How should I interpret "size" parameter in Doc2Vec function of gensim?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With