Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implement word2vec in Keras

I would like to implement word2vec algorithm in keras, Is this possible? How can I fit the model? Should I use custom loss function?

like image 815
András Avatar asked Oct 25 '16 15:10

András


People also ask

How do you implement Word2Vec?

To implement Word2Vec, there are two flavors to choose from — Continuous Bag-Of-Words (CBOW) or continuous Skip-gram (SG). In short, CBOW attempts to guess the output (target word) from its neighbouring words (context words) whereas continuous Skip-Gram guesses the context words from a target word.

Is BERT better than Word2Vec?

Word2Vec will generate the same single vector for the word bank for both the sentences. Whereas, BERT will generate two different vectors for the word bank being used in two different contexts. One vector will be similar to words like money, cash etc. The other vector would be similar to vectors like beach, coast etc.


1 Answers

Is this possible?

You've already answered it yourself: yes. In addition to word2veckeras, which uses gensim, here's another CBOW implementation that doesn't have extra dependencies (just in case, I'm not affiliated with this repo). You can use them as examples.

How can I fit the model?

Since the training data is the large corpus of sentences, the most convenient method is model.fit_generator, which "fits the model on data generated batch-by-batch by a Python generator". The generator runs indefinitely yielding (word, context, target) CBOW (or SG) tuples, but you manually specify sample_per_epoch and nb_epoch to limit the training. This way you decouple sentence analysis (tokenization, word index table, sliding window, etc) and actual keras model, plus save a lot of resources.

Should I use custom loss function?

CBOW minimizes the distance between the predicted and true distribution of the center word, so in the simplest form categorical_crossentropy will do it. If you implement negative sampling, which is a bit more complex, yet much more efficient, the loss function changes to binary_crossentropy. Custom loss function is unnecessary.

For anyone interested in details of math and probabilistic model, I highly recommend CS224D class by Stanford. Here is the lecture notes about word2vec, CBOW and Skip-Gram.

Another useful reference: word2vec implementation in pure numpy and c.

like image 108
Maxim Avatar answered Oct 19 '22 22:10

Maxim