Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Gensim 3.8.0 to Gensim 4.0.0

I have trained a Word2Vec model using Gensim 3.8.0. Later I tried to use the pretrained model using Gensim 4.0.o on GCP. I used the following code:

model = KeyedVectors.load_word2vec_format(wv_path, binary= False)
words = model.wv.vocab.keys()
self.word2vec = {word:model.wv[word]%EMBEDDING_DIM for word in words}

I was getting error that "model.mv" has been removed from Gensim 4.0.0. Then I used the following code:

model = KeyedVectors.load_word2vec_format(wv_path, binary= False)
words = model.vocab.keys()
word2vec = {word:model[word]%EMBEDDING_DIM for word in words}

And getting the following error:

AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0.
Use KeyedVector's .key_to_index dict, .index_to_key list, and methods .get_vecattr(key, attr) and .set_vecattr(key, attr, new_val) instead.
See https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4

Can anyone please suggest that how can I use the pretrained model & return a dictionary in Gensim 4.0.0?

like image 251
Md. Ahsanul Kabir Arif Avatar asked Mar 30 '21 09:03

Md. Ahsanul Kabir Arif


People also ask

What is the latest version of Gensim?

The current version of Gensim is 3.8. 0 which was released in July 2019.

Does Gensim use CBOW or skip gram?

Gensim Python Library Introduction Gensim library will enable us to develop word embeddings by training our own word2vec models on a custom corpus either with CBOW of skip-grams algorithms.


2 Answers

The changes caused by the migration from Gensim 3.x to 4 are all present in the github link:

https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4

For the above problem, the solution that worked for me:

    words = list(model.wv.index_to_key)
like image 68
Debangan Mandal Avatar answered Sep 19 '22 17:09

Debangan Mandal


The migration notes explain major changes & how to adapt your code:

https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4

Per the guidance there, to just get a list of the words, since your model variable is already an instance of KeyedVectors, you can use:

model.index_to_key

Your code doesn't show a need for a dict, but there is a slightly-different word-to-index-position dict in model.key_to_index. However, you can just use model[key] like before to get individual vectors.

(Separately: I can't imagine your %EMBEDDING_DIM is doing anything useful. Why would you want to perform an elementwise % modulus operation, using the integer count of dimensions, against individual dimensions that are often small floating-point numbers? It'll often be harmless, as the EMBEDDING_DIM will usually be far larger than the individual values, but it doesn't serve any good purpose.)

like image 26
gojomo Avatar answered Sep 22 '22 17:09

gojomo