Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Save gensim Word2vec model in binary format .bin with save_word2vec_format

I'm training my own word2vec model using different data. To implement the resulting model into my classifier and compare the results with the original pre-trained Word2vec model I need to save the model in binary extension .bin. Here is my code, sentences is a list of short messages.

import gensim, logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
sentences = gensim.models.word2vec.LineSentence('dati.txt')
model = gensim.models.Word2Vec(
sentences, size=300, window=5, min_count=5, workers=5,
sg=1, hs=1, negative=0
)
model.save_word2vec_format('model.bin', binary=True)

The last method, save_word2vec_format, gives me this error:

AttributeError: 'Word2Vec' object has no attribute 'save_word2vec_format'

What am I missing here? I've read the documentation of gensim and other forums. This repo on github uses almost the same configuration so I cannot understand what's wrong. I've tried to switch from skipgram to cbow and from hierarchical softmax to negative sampling with no results.

Thank you in advance!

like image 293
carloab Avatar asked Feb 22 '17 18:02

carloab


2 Answers

from gensim.models import Word2Vec, KeyedVectors   
model.wv.save_word2vec_format('model.bin', binary=True)
like image 165
Kiran Avatar answered Nov 09 '22 23:11

Kiran


Are you using a pre-release release candidate version of gensim, or code directly from the develop branch?

In those versions save_word2vec_format() has moved to a utility class called KeyedVectors.

You won't yet (as of February 2017) get these versions from the usual way of installing gensim, pip install gensim – and it's likely that by the time this change is in the official distribution, the error message for trying the older call will be improved.

I recommend using the version that comes via plain pip install gensim unless you are a relatively expert user who is also carefully following the project CHANGELOG.md.

like image 44
gojomo Avatar answered Nov 10 '22 01:11

gojomo