By doc we can use this to read a word2vec model with genism
model = KeyedVectors.load_word2vec_format('word2vec.50d.txt', binary=False)
This is an index-to-word mapping, that is, e.g., model.index2word[2]
, how to derive an inverted mapping (word-to-index) based on this?
To assess which word2vec model is best, simply calculate the distance for each pair, do it 200 times, sum up the total distance, and the smallest total distance will be your best model.
Word2Vec is a widely used word representation technique that uses neural networks under the hood. The resulting word representation or embeddings can be used to infer semantic similarity between words and phrases, expand queries, surface related concepts and more.
This saved model can be loaded again using load() , which supports online training and getting vectors for vocabulary words. fname (str) – Path to the file. Score the log probability for a sequence of sentences. This does not change the fitted model in any way (see train() for that).
The mappings from word-to-index are in the KeyedVectors
vocab
property, a dictionary with objects that include an index
property.
For example:
word = "whatever" # for any word in model
i = model.vocab[word].index
model.index2word[i] == word # will be true
Even simpler solution would be to enumerate index2word
word2index = {token: token_index for token_index, token in enumerate(w2v.index2word)}
word2index['hi'] == 30308 # True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With