Gensim 3.8.0 to Gensim 4.0.0

Tags:

I have trained a Word2Vec model using Gensim 3.8.0. Later I tried to use the pretrained model using Gensim 4.0.o on GCP. I used the following code:

model = KeyedVectors.load_word2vec_format(wv_path, binary= False)
words = model.wv.vocab.keys()
self.word2vec = {word:model.wv[word]%EMBEDDING_DIM for word in words}

I was getting error that "model.mv" has been removed from Gensim 4.0.0. Then I used the following code:

model = KeyedVectors.load_word2vec_format(wv_path, binary= False)
words = model.vocab.keys()
word2vec = {word:model[word]%EMBEDDING_DIM for word in words}

And getting the following error:

AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0.
Use KeyedVector's .key_to_index dict, .index_to_key list, and methods .get_vecattr(key, attr) and .set_vecattr(key, attr, new_val) instead.
See https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4

Can anyone please suggest that how can I use the pretrained model & return a dictionary in Gensim 4.0.0?

251

asked Mar 30 '21 09:03

2 Answers

The changes caused by the migration from Gensim 3.x to 4 are all present in the github link:

https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4

For the above problem, the solution that worked for me:

    words = list(model.wv.index_to_key)

answered Sep 19 '22 17:09

Debangan Mandal

The migration notes explain major changes & how to adapt your code:

https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4

Per the guidance there, to just get a list of the words, since your model variable is already an instance of KeyedVectors, you can use:

model.index_to_key

Your code doesn't show a need for a dict, but there is a slightly-different word-to-index-position dict in model.key_to_index. However, you can just use model[key] like before to get individual vectors.

(Separately: I can't imagine your %EMBEDDING_DIM is doing anything useful. Why would you want to perform an elementwise % modulus operation, using the integer count of dimensions, against individual dimensions that are often small floating-point numbers? It'll often be harmless, as the EMBEDDING_DIM will usually be far larger than the individual values, but it doesn't serve any good purpose.)

answered Sep 22 '22 17:09

gojomo

Related questions
                            
                                User input in dialog box
                            
                                How to get the user's name in Telegram Bot?
                            
                                How to use spaCy to create a new entity and learn only from keyword list
                            
                                Python 3.6.x PyInstaller gives error "No module named 'PyQt5.sip'"
                            
                                AttributeError: module 'tensorflow' has no attribute 'name_scope' with Keras
                            
                                Django 2.0 url parameters in get_queryset
                            
                                How to retrieve well formatted JSON from AWS Lambda using Python
                            
                                Python 3 handling error TypeError: catching classes that do not inherit from BaseException is not allowed
                            
                                How can I make seaborn distribution subplots in a loop?
                            
                                In Python, how is the in operator implemented to work? Does it use the next() method of the iterators?
                            
                                AttributeError when using ColumnTransformer into a pipeline
                            
                                Django can' t load Module 'debug_toolbar': No module named 'debug_toolbar'
                            
                                pymysql.err.InterfaceError: (0, '') error when doing a lot of pushes to sql table
                            
                                How to properly structure internal scripts in a Python project?
                            
                                Most pythonic callable generating True?
                            
                                Getting the literal out of a python Literal type, at runtime?
                            
                                Using YOLO or other image recognition techniques to identify all alphanumeric text present in images
                            
                                JSON to Protobuf in Python
                            
                                How to format requirements.txt when package source is from specific websites?
                            
                                Airflow - got an unexpected keyword argument 'conf'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Gensim 3.8.0 to Gensim 4.0.0

Tags:

python

nlp

word-embedding

gensim

word2vec

Md. Ahsanul Kabir Arif

People also ask

2 Answers

Debangan Mandal

gojomo

Recent Activity

Donate For Us