doc2vec: How is PV-DBOW implemented

Question

I know that there exists already an implementation of PV-DBOW (paragraph vector) in python (gensim). But I'm interested in knowing how to implement it myself. The explanation from the official paper for PV-DBOW is as follows:

Another way is to ignore the context words in the input, but force the model to predict words randomly sampled from the paragraph in the output. In reality, what this means is that at each iteration of stochastic gradient descent, we sample a text window, then sample a random word from the text window and form a classification task given the Paragraph Vector.

According to the paper the word vectors are not stored and PV-DBOW is said to work similar to skip gram in word2vec.

Skip-gram is explained in word2vec Parameter Learning. In the skip gram model the word vectors are mapped to the hidden layer. The matrix that performs this mapping is updated during the training. In PV-DBOW the dimension of the hidden layer should be the dimension of one paragraph vector. When I want to multiply the word vector of a sampled example with the paragraph vector they should have the same size. The original representation of a word is of size (vocabulary size x 1). Which mapping is performed to get the right size (paragraph dimension x 1) in the hidden layer. And how is this mapping performed when the word vectors are not stored? I assume that word and paragraph representation should have the same size in the hidden layer because of equation 26 in word2vec Parameter Learning

Cedias · Accepted Answer

Yes, PV-DBOW can be easily implemented using word2vec skip-gram model.

Say you have the following sentence:

Children are running in the park

The skip-gram model tries to predict surrounding words in a fixed-window context to learn word vectors. If the window size is 2 the objective is the following:

word ->  context words to predict
--------------------------------
Children -> (are, running)
are -> (children, running, in)
running -> (children, are, in, the)
in -> (are, running, the, park)
the -> (running, in, park)
park -> (in, the)

Now, you can simply modify how the word -> context to predict data is fed to your skip-gram implementation like so:

word ->  context words to predict
--------------------------------
PAR#33 -> (Children, are, running, in, the, park)

PAR#33, which is just another word for your model (same length) is in reality a token representing the whole paragraph(sentence)

It's kind of a skip-gram model with a "paragraph-sized window"

doc2vec: How is PV-DBOW implemented

Tags:

machine-learning

neural-network

nlp

gensim

word2vec

саша

1 Answers

Cedias

Recent Activity

Donate For Us

doc2vec: How is PV-DBOW implemented

Tags:

machine-learning

neural-network

nlp

gensim

word2vec

саша

1 Answers

Cedias

Related questions

Recent Activity

Donate For Us