I have trained a word2vec model using a corpus of documents with Gensim. Once the model is training, I am writing the following piece of code to get the raw feature vector of a word say "view".
myModel["view"]
However, I get a KeyError for the word which is probably because this doesn't exist as a key in the list of keys indexed by word2vec. How can I check if a key exits in the index before trying to get the raw feature vector?
To assess which word2vec model is best, simply calculate the distance for each pair, do it 200 times, sum up the total distance, and the smallest total distance will be your best model.
In order to train neural networks like this, we follow these steps: we take a training sample and generate the output value of the nework. we evaluate the loss by comparing the model prediction with the true output label. we update weights of the network by using gradient descent technique on the evaluated loss.
min_count: The minimum count of words to consider when training the model; words with occurrence less than this count will be ignored. The default for min_count is 5.
The pre-trained Google word2vec model was trained on Google news data (about 100 billion words); it contains 3 million words and phrases and was fit using 300-dimensional word vectors. It is a 1.53 Gigabytes file. You can download it from here: GoogleNews-vectors-negative300.
Word2Vec also provides a 'vocab' member, which you can access directly.
Using a pythonistic approach:
if word in w2v_model.vocab:
# Do something
EDIT Since gensim release 2.0, the API for Word2Vec changed. To access the vocabulary you should now use this:
if word in w2v_model.wv.vocab:
# Do something
EDIT 2 The attribute 'wv' is being deprecated and will be completed removed in gensim 4.0.0. Now it's back to the original answer by OP:
if word in w2v_model.vocab:
# Do something
convert the model into vectors with
word_vectors = model.wv
then we can use
if 'word' in word_vectors.vocab
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With