I have trained a Word2Vec model using Gensim 3.8.0. Later I tried to use the pretrained model using Gensim 4.0.o on GCP. I used the following code:
model = KeyedVectors.load_word2vec_format(wv_path, binary= False)
words = model.wv.vocab.keys()
self.word2vec = {word:model.wv[word]%EMBEDDING_DIM for word in words}
I was getting error that "model.mv" has been removed from Gensim 4.0.0. Then I used the following code:
model = KeyedVectors.load_word2vec_format(wv_path, binary= False)
words = model.vocab.keys()
word2vec = {word:model[word]%EMBEDDING_DIM for word in words}
And getting the following error:
AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0.
Use KeyedVector's .key_to_index dict, .index_to_key list, and methods .get_vecattr(key, attr) and .set_vecattr(key, attr, new_val) instead.
See https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4
Can anyone please suggest that how can I use the pretrained model & return a dictionary in Gensim 4.0.0?
The current version of Gensim is 3.8. 0 which was released in July 2019.
Gensim Python Library Introduction Gensim library will enable us to develop word embeddings by training our own word2vec models on a custom corpus either with CBOW of skip-grams algorithms.
The changes caused by the migration from Gensim 3.x to 4 are all present in the github link:
https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4
For the above problem, the solution that worked for me:
words = list(model.wv.index_to_key)
The migration notes explain major changes & how to adapt your code:
https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4
Per the guidance there, to just get a list of the words, since your model
variable is already an instance of KeyedVectors
, you can use:
model.index_to_key
Your code doesn't show a need for a dict, but there is a slightly-different word-to-index-position dict in model.key_to_index
. However, you can just use model[key]
like before to get individual vectors.
(Separately: I can't imagine your %EMBEDDING_DIM
is doing anything useful. Why would you want to perform an elementwise %
modulus operation, using the integer count of dimensions, against individual dimensions that are often small floating-point numbers? It'll often be harmless, as the EMBEDDING_DIM
will usually be far larger than the individual values, but it doesn't serve any good purpose.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With