I would like to implement word2vec algorithm in keras, Is this possible? How can I fit the model? Should I use custom loss function?
To implement Word2Vec, there are two flavors to choose from — Continuous Bag-Of-Words (CBOW) or continuous Skip-gram (SG). In short, CBOW attempts to guess the output (target word) from its neighbouring words (context words) whereas continuous Skip-Gram guesses the context words from a target word.
Word2Vec will generate the same single vector for the word bank for both the sentences. Whereas, BERT will generate two different vectors for the word bank being used in two different contexts. One vector will be similar to words like money, cash etc. The other vector would be similar to vectors like beach, coast etc.
Is this possible?
You've already answered it yourself: yes. In addition to word2veckeras
, which uses gensim
, here's another CBOW implementation that doesn't have extra dependencies (just in case, I'm not affiliated with this repo). You can use them as examples.
How can I fit the model?
Since the training data is the large corpus of sentences, the most convenient method is model.fit_generator
, which "fits the model on data generated batch-by-batch by a Python generator". The generator runs indefinitely yielding (word, context, target)
CBOW (or SG) tuples, but you manually specify sample_per_epoch
and nb_epoch
to limit the training. This way you decouple sentence analysis (tokenization, word index table, sliding window, etc) and actual keras model, plus save a lot of resources.
Should I use custom loss function?
CBOW minimizes the distance between the predicted and true distribution of the center word, so in the simplest form categorical_crossentropy
will do it.
If you implement negative sampling, which is a bit more complex, yet much more efficient, the loss function changes to binary_crossentropy
. Custom loss function is unnecessary.
For anyone interested in details of math and probabilistic model, I highly recommend CS224D class by Stanford. Here is the lecture notes about word2vec, CBOW and Skip-Gram.
Another useful reference: word2vec implementation in pure numpy
and c
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With