Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

doc2vec: How is PV-DBOW implemented

I know that there exists already an implementation of PV-DBOW (paragraph vector) in python (gensim). But I'm interested in knowing how to implement it myself. The explanation from the official paper for PV-DBOW is as follows:

Another way is to ignore the context words in the input, but force the model to predict words randomly sampled from the paragraph in the output. In reality, what this means is that at each iteration of stochastic gradient descent, we sample a text window, then sample a random word from the text window and form a classification task given the Paragraph Vector.

According to the paper the word vectors are not stored and PV-DBOW is said to work similar to skip gram in word2vec.

Skip-gram is explained in word2vec Parameter Learning. In the skip gram model the word vectors are mapped to the hidden layer. The matrix that performs this mapping is updated during the training. In PV-DBOW the dimension of the hidden layer should be the dimension of one paragraph vector. When I want to multiply the word vector of a sampled example with the paragraph vector they should have the same size. The original representation of a word is of size (vocabulary size x 1). Which mapping is performed to get the right size (paragraph dimension x 1) in the hidden layer. And how is this mapping performed when the word vectors are not stored? I assume that word and paragraph representation should have the same size in the hidden layer because of equation 26 in word2vec Parameter Learning

like image 362
саша Avatar asked Mar 15 '16 01:03

саша


1 Answers

Yes, PV-DBOW can be easily implemented using word2vec skip-gram model.

Say you have the following sentence:

Children are running in the park

The skip-gram model tries to predict surrounding words in a fixed-window context to learn word vectors. If the window size is 2 the objective is the following:

word ->  context words to predict
--------------------------------
Children -> (are, running)
are -> (children, running, in)
running -> (children, are, in, the)
in -> (are, running, the, park)
the -> (running, in, park)
park -> (in, the)

Now, you can simply modify how the word -> context to predict data is fed to your skip-gram implementation like so:

word ->  context words to predict
--------------------------------
PAR#33 -> (Children, are, running, in, the, park)

PAR#33, which is just another word for your model (same length) is in reality a token representing the whole paragraph(sentence)

It's kind of a skip-gram model with a "paragraph-sized window"

like image 64
Cedias Avatar answered Sep 24 '22 23:09

Cedias