I am trying to understand the epochs
parameter in the Doc2Vec
function and epochs
parameter in the train
function.
In the following code snippet, I manually set up a loop of 4000 iterations. Is it required or passing 4000 as epochs parameter in the Doc2Vec enough? Also how epochs
in Doc2Vec
is different from epochs in train
?
documents = Documents(train_set)
model = Doc2Vec(vector_size=100, dbow_words=1, dm=0, epochs=4000, window=5,
seed=1337, min_count=5, workers=4, alpha=0.001, min_alpha=0.025)
model.build_vocab(documents)
for epoch in range(model.epochs):
print("epoch "+str(epoch))
model.train(documents, total_examples=total_length, epochs=1)
ckpnt = model_name+"_epoch_"+str(epoch)
model.save(ckpnt)
print("Saving {}".format(ckpnt))
Also, how and when are the weights updated?
You don't have to manually run the iteration, and you shouldn't call train()
more than once unless you're an expert who needs to do so for very specific reasons. If you've seen this technique in some online example you're copying, that example is likely outdated and misleading.
Call train()
once, with your preferred number of passes as the epochs
parameter.
Also, don't use a starting alpha
learning-rate that is low (0.001
) that then rises to a min_alpha
value 25 times larger (0.025
) - that's not how this is supposed to work, and most users shouldn't need to adjust the alpha
-related defaults at all. (Again, if you're getting this from an online example somewhere - that's a bad example. Let them know they're giving bad advice.)
Also, 4000 training epochs is absurdly large. A value of 10-20 is common in published work, when dealing with tens-of-thousands to millions of documents. If your dataset is smaller, it may not work well with Doc2Vec
, but sometimes more epochs (or smaller vector_size
) can still learn something generalizable from tiny data - but still expect to use closer to dozens of epochs (not thousands).
A good intro (albeit with a tiny dataset that barely works with Doc2Vec
) is the doc2vec-lee.ipynb
Jupyter notebook that's bundled with gensim, and also viewable online at:
https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb
Good luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With