Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save as a gensim word2vec file?

I have two lists, A is a list of words, for example ["hello","world",......], Len(A) is 10000. List B contains the all pre-trained vectors corresponding to A, which is a [10000,512], 512 is the vector dimension. I want to convert two lists into gensim word2vec model format in order to load the model in later, such as model = Word2Vec.load("word2vec.model") how should I do this?

like image 851
HAO CHEN Avatar asked Oct 19 '25 14:10

HAO CHEN


1 Answers

As you only have the words and their vectors, you don't quite have enough info for a full Word2Vec model (which includes other things like the internal neural network's hidden weights, and word frequencies).

But you can create a gensim KeyedVectors object, of the general kind that's in a gensim Word2Vec model .wv property. It has many of the helper methods (like most_similar()) you may be interested in using.

Let's assume your A list-of-words is in a more-helpfully named Python list called words_list, and your B list-of-vectors is in a more-helpfully named Python list called 'vectors_list`.

Try:

from gensim.models import KeyedVectors
kv = new KeyedVectors(512)
kv.add(words_list, vectors_list)
kv.save(`mywordvecs.kvmodel`)

You could then later re-load these via:

kv2 = KeyedVectors.load(`mywordvecs.kvmodel`)

(You could also use save_word2vec_format() and load_word2vec_format() instead of gensim's native save()/load(), if you wanted simpler plain-vectors formats that could also be loaded by other tools that use that format. But if you're staying within gensim, the plain save()/load() are just as good – and would be better if saving a more complex trained Word2Vec model, because they'd retain the extra info those objects contain.)

like image 84
gojomo Avatar answered Oct 22 '25 06:10

gojomo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!