Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tracking loss and embeddings in Gensim word2vec model

I'm pretty new to Gensim and I'm trying to train my first model using word2vec model. I see that all the parameters are pretty straightforward and easy to understand, however I don't know how to track the loss of the model to see the progress. Also, I would like to be able to get the embeddings after each epoch so that I can also show that the predictions also get more logical with after each epoch. How can I do that?

OR, is it better to train for iter=1 each time and save the loss and embeddings after each epoch? Sounds not too efficient.

Not much to show with the code but still posting it below:

model = Word2Vec(sentences = trainset, 
             iter = 5, # epoch
             min_count = 10, 
             size = 150, 
             workers = 4, 
             sg = 1, 
             hs = 1, 
             negative = 0, 
             window = 9999)
like image 466
melowgs Avatar asked Dec 17 '22 18:12

melowgs


1 Answers

gensim allows us to use callbacks for such purposes.

Example:

from gensim.models.callbacks import CallbackAny2Vec

class MonitorCallback(CallbackAny2Vec):
    def __init__(self, test_words):
        self._test_words = test_words

    def on_epoch_end(self, model):
        print("Model loss:", model.get_latest_training_loss())  # print loss
        for word in self._test_words:  # show wv logic changes
            print(model.wv.most_similar(word))

"""
prepare datasets etc.
... 
...
"""

monitor = MonitorCallback(["word", "I", "less"])  # monitor with demo words
model = Word2Vec(sentences = trainset, 
             iter = 5, # epoch
             min_count = 10, 
             size = 150, 
             workers = 4, 
             sg = 1, 
             hs = 1, 
             negative = 0, 
             window = 9999, 
             callbacks=[monitor])
  • now there's some issues with get_latest_training_loss - may be it's incorrect (bad luck, for now github is down, can't check). I've tested this code and loss increases - looks weird.
  • may be you prefer logging - gensim is fitted for it.
like image 148
Mikhail Stepanov Avatar answered Dec 29 '22 07:12

Mikhail Stepanov