Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is different between doc2vec models when the dbow_words is set to 1 or 0?

Tags:

gensim

doc2vec

I read this page but I do not understand what is different between models which are built based on the following codes. I know when dbow_words is 0, training of doc-vectors is faster.

First model

model = doc2vec.Doc2Vec(documents1, size = 100, window = 300, min_count = 10, workers=4)

Second model

model = doc2vec.Doc2Vec(documents1, size = 100, window = 300, min_count = 10, workers=4,dbow_words=1)
like image 439
user3092781 Avatar asked May 16 '17 21:05

user3092781


People also ask

What is Doc2Vec model?

Doc2Vec model, as opposite to Word2Vec model, is used to create a vectorised representation of a group of words taken collectively as a single unit. It doesn't only give the simple average of the words in the sentence.

What is vector size in Doc2Vec?

A size of 100 means the vector representing each document will contain 100 elements - 100 values. The vector maps the document to a point in 100 dimensional space. A size of 200 would map a document to a point in 200 dimensional space. The more dimensions, the more differentiation between documents.

What is DM in Doc2Vec?

In this post, I will attempt to convey the two document embeddings models from Paragraph Vector (more popularly known as Doc2Vec) — Distributed Memory (PV-DM) and Distributed Bag Of Words (DBOW) and point to some other document embedding approaches in last part of this post.


1 Answers

The dbow_words parameter only has effect when training a DBOW model – that is, with the non-default dm=0 parameter.

So, between your two example lines of code, which both leave the default dm=1 value unchanged, there's no difference.

If you instead switch to DBOW training, dm=0, then with a default dbow_words=0 setting, the model is pure PV-DBOW as described in the original 'Paragraph Vectors' paper. Doc-vectors are trained to be predictive of text example words, but no word-vectors are trained. (There'll still be some randomly-initialized word-vectors in the model, but they're not used or improved during training.) This mode is fast and still works pretty well.

If you add the dbow_words=1 setting, then skip-gram word-vector training will be added to the training, in an interleaved fashion. (For each text example, both doc-vectors over the whole text, then word-vectors over each sliding context window, will be trained.) Since this adds more training examples, as a function of the window parameter, it will be significantly slower. (For example, with window=5, adding word-training will make training about 5x slower.)

This has the benefit of placing both the DBOW doc-vectors and the word-vectors into the "same space" - perhaps making the doc-vectors more interpretable by their closeness to words.

This mixed training might serve as a sort of corpus-expansion – turning each context-window into a mini-document – that helps improve the expressiveness of the resulting doc-vector embeddings. (Though, especially with sufficiently large and diverse document sets, it may be worth comparing against pure-DBOW with more passes.)

like image 146
gojomo Avatar answered Dec 29 '22 19:12

gojomo