Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gensim word2vec - array dimensions in updating with online word embedding

Word2Vec from gensim 0.13.4.1 to update the word vectors on the fly does not work.

model.build_vocab(sentences, update=False)

works fine; however,

model.build_vocab(sentences, update=True)

does not.


I am using this website to try and emulate what they have done; hence I use the following script at some point:

model = gensim.models.Word2Vec()
sentences = gensim.models.word2vec.LineSentence("./text8/text8")
model.build_vocab(sentences, keep_raw_vocab=False, trim_rule=None, progress_per=10000, update=False)
model.train(sentences)

However while this runs with update=False, using update=True gives me the following traceback:

Traceback (most recent call last):
  File "word2vecAttempt.py", line 34, in <module>
    model.build_vocab(sentences, progress_per=10000, update=True)
  File "/home/brownc/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 535, in build_vocab
    self.finalize_vocab(update=update)  # build tables & arrays
  File "/home/brownc/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 708, in finalize_vocab
    self.update_weights()
  File "/home/brownc/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 1070, in update_weights
    self.wv.syn0 = vstack([self.wv.syn0, newsyn0])
  File "/home/brownc/anaconda3/lib/python3.5/site-packages/numpy/core/shape_base.py", line 230, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly
like image 956
chase Avatar asked Feb 21 '17 02:02

chase


1 Answers

I was able to reproduce your error. I think you're calling update=True when the model is not trained yet. You should only call it when it has been pre-trained.

This works:

import gensim

model = gensim.models.Word2Vec()
sentences = gensim.models.word2vec.LineSentence("text8")
model.build_vocab(sentences, update=False)
model.train(sentences)

model.build_vocab(sentences, update=True)
model.train(sentences)

But this will fail:

import gensim

model = gensim.models.Word2Vec()
sentences = gensim.models.word2vec.LineSentence("text8")
model.build_vocab(sentences, update=True)
model.train(sentences)

ValueError: all the input array dimensions except for the concatenation axis must match exactly

Using the latest version of gensim 0.13.4.1.

like image 133
Kamil Sindi Avatar answered Oct 12 '22 21:10

Kamil Sindi