Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check if a key exists in a word2vec trained model or not

I have trained a word2vec model using a corpus of documents with Gensim. Once the model is training, I am writing the following piece of code to get the raw feature vector of a word say "view".

myModel["view"]

However, I get a KeyError for the word which is probably because this doesn't exist as a key in the list of keys indexed by word2vec. How can I check if a key exits in the index before trying to get the raw feature vector?

like image 572
London guy Avatar asked May 18 '15 11:05

London guy


People also ask

How do I test a Word2vec model?

To assess which word2vec model is best, simply calculate the distance for each pair, do it 200 times, sum up the total distance, and the smallest total distance will be your best model.

How is a Word2vec model trained?

In order to train neural networks like this, we follow these steps: we take a training sample and generate the output value of the nework. we evaluate the loss by comparing the model prediction with the true output label. we update weights of the network by using gradient descent technique on the evaluated loss.

What is Min_count in Word2vec?

min_count: The minimum count of words to consider when training the model; words with occurrence less than this count will be ignored. The default for min_count is 5.

What is Gensim Word2vec trained on?

The pre-trained Google word2vec model was trained on Google news data (about 100 billion words); it contains 3 million words and phrases and was fit using 300-dimensional word vectors. It is a 1.53 Gigabytes file. You can download it from here: GoogleNews-vectors-negative300.


2 Answers

Word2Vec also provides a 'vocab' member, which you can access directly.

Using a pythonistic approach:

if word in w2v_model.vocab:
    # Do something

EDIT Since gensim release 2.0, the API for Word2Vec changed. To access the vocabulary you should now use this:

if word in w2v_model.wv.vocab:
    # Do something

EDIT 2 The attribute 'wv' is being deprecated and will be completed removed in gensim 4.0.0. Now it's back to the original answer by OP:

if word in w2v_model.vocab:
    # Do something
like image 51
Matt Fortier Avatar answered Oct 09 '22 20:10

Matt Fortier


convert the model into vectors with

word_vectors = model.wv

then we can use

if 'word' in word_vectors.vocab
like image 35
rakaT Avatar answered Oct 09 '22 19:10

rakaT