Word2Vec from gensim 0.13.4.1 to update the word vectors on the fly does not work.
model.build_vocab(sentences, update=False)
works fine; however,
model.build_vocab(sentences, update=True)
does not.
I am using this website to try and emulate what they have done; hence I use the following script at some point:
model = gensim.models.Word2Vec()
sentences = gensim.models.word2vec.LineSentence("./text8/text8")
model.build_vocab(sentences, keep_raw_vocab=False, trim_rule=None, progress_per=10000, update=False)
model.train(sentences)
However while this runs with update=False
, using update=True
gives me the following traceback:
Traceback (most recent call last):
File "word2vecAttempt.py", line 34, in <module>
model.build_vocab(sentences, progress_per=10000, update=True)
File "/home/brownc/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 535, in build_vocab
self.finalize_vocab(update=update) # build tables & arrays
File "/home/brownc/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 708, in finalize_vocab
self.update_weights()
File "/home/brownc/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 1070, in update_weights
self.wv.syn0 = vstack([self.wv.syn0, newsyn0])
File "/home/brownc/anaconda3/lib/python3.5/site-packages/numpy/core/shape_base.py", line 230, in vstack
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly
I was able to reproduce your error. I think you're calling update=True
when the model is not trained yet. You should only call it when it has been pre-trained.
This works:
import gensim
model = gensim.models.Word2Vec()
sentences = gensim.models.word2vec.LineSentence("text8")
model.build_vocab(sentences, update=False)
model.train(sentences)
model.build_vocab(sentences, update=True)
model.train(sentences)
But this will fail:
import gensim
model = gensim.models.Word2Vec()
sentences = gensim.models.word2vec.LineSentence("text8")
model.build_vocab(sentences, update=True)
model.train(sentences)
ValueError: all the input array dimensions except for the concatenation axis must match exactly
Using the latest version of gensim 0.13.4.1.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With