I'm using gensim implementation of Word2Vec. I have the following code snippet:
print('training model')
model = Word2Vec(Sentences(start, end))
print('trained model:', model)
print('vocab:', model.vocab.keys())
When I run this in python2, it runs as expected. The final print is all the words in the vocabulary.
However, if I run it in python3, I get an error:
trained model: Word2Vec(vocab=102, size=100, alpha=0.025)
Traceback (most recent call last):
File "learn.py", line 58, in <module>
train(to_datetime('-4h'), to_datetime('now'), 'model.out')
File "learn.py", line 23, in train
print('vocab:', model.vocab.keys())
AttributeError: 'Word2Vec' object has no attribute 'vocab'
What is going on? Is gensim word2vec not compatible with python3?
In pre-4.0 versions, the vocabulary was in the vocab field of the Word2Vec model's wv property, as a dictionary, with the keys being each token (word). So there it was just the usual Python for getting a dictionary's length: len(w2v_model.wv.vocab)
Facebook's 'FastText' descendent of the word2vec algorithm can offer better-than-random vectors for unseen words – but it builds such vectors from word fragments (character n-gram vectors), so it does best where shared word roots exist, or where the out-of-vocabulary word is just a typo of a trained word.
Training the Word2Vec model You just instantiate Word2Vec and pass the reviews that we read in the previous step. So, we are essentially passing on a list of lists. Where each list within the main list contains a set of tokens from a user review. Word2Vec uses all these tokens to internally create a vocabulary.
Are you using the same version of gensim in both places? Gensim 1.0.0 moves vocab
to a helper object, so whereas in pre-1.0.0 versions of gensim (in Python 2 or 3), you can use:
model.vocab
...in gensim 1.0.0+ you should instead use (in Python 2 or 3)...
model.wv.vocab
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With