Is there a way to get the document vectors of unseen and seen documents from Doc2Vec in the gensim 0.11.1 version?
For example, suppose I trained the model on 1000 thousand - Can I get the doc vector for those 1000 docs?
Is there a way to get document vectors of unseen documents composed
from the same vocabulary?
But in Doc2Vec, what does it really mean, in technical language? A size of 100 means the vector representing each document will contain 100 elements - 100 values. The vector maps the document to a point in 100 dimensional space. A size of 200 would map a document to a point in 200 dimensional space.
Doc2vec model is based on Word2Vec, with only adding another vector (paragraph ID) to the input.
For the first bullet point, you can do it in gensim 0.11.1
from gensim.models import Doc2Vec
from gensim.models.doc2vec import LabeledSentence
documents = []
documents.append( LabeledSentence(words=[u'some', u'words', u'here'], labels=[u'SENT_1']) )
documents.append( LabeledSentence(words=[u'some', u'people', u'words', u'like'], labels=[u'SENT_2']) )
documents.append( LabeledSentence(words=[u'people', u'like', u'words'], labels=[u'SENT_3']) )
model = Doc2Vec(size=10, window=8, min_count=0, workers=4)
model.build_vocab(documents)
model.train(documents)
print(model[u'SENT_3'])
Here SENT_3 is a known sentence.
For the second bullet point, you can NOT do it in gensim 0.11.1, you have to update it to 0.12.4. This latest version has infer_vector function which can generate a vector for an unseen document.
documents = []
documents.append( LabeledSentence([u'some', u'words', u'here'], [u'SENT_1']) )
documents.append( LabeledSentence([u'some', u'people', u'words', u'like'], [u'SENT_2']) )
documents.append( LabeledSentence([u'people', u'like', u'words'], [u'SENT_3']) )
model = Doc2Vec(size=10, window=8, min_count=0, workers=4)
model.build_vocab(documents)
model.train(documents)
print(model.docvecs[u'SENT_3']) # generate a vector for a known sentence
print(model.infer_vector([u'people', u'like', u'words'])) # generate a vector for an unseen sentence
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With