How does gensim calculate doc2vec paragraph vectors

Tags:

i am going thorugh this paper http://cs.stanford.edu/~quocle/paragraph_vector.pdf

and it states that

" Theparagraph vector and word vectors are averaged or concatenated to predict the next word in a context. In the experiments, we use concatenation as the method to combine the vectors."

How does concatenation or averaging work?

example (if paragraph 1 contain word1 and word2):

word1 vector =[0.1,0.2,0.3]
word2 vector =[0.4,0.5,0.6]

concat method 
does paragraph vector = [0.1+0.4,0.2+0.5,0.3+0.6] ?

Average method 
does paragraph vector = [(0.1+0.4)/2,(0.2+0.5)/2,(0.3+0.6)/2] ?

Also from this image:

It is stated that :

The paragraph token can be thought of as another word. It acts as a memory that remembers what is missing from the current context – or the topic of the paragraph. For this reason, we often call this model the Distributed Memory Model of Paragraph Vectors (PV-DM).

Is the paragraph token equal to the paragraph vector which is equal to on?

enter image description here

328

asked Nov 04 '16 01:11

jxn

1 Answers

How does concatenation or averaging work?

You got it right for the average. The concatenation is: [0.1,0.2,0.3,0.4,0.5,0.6].

Is the paragraph token equal to the paragraph vector which is equal to on?

The "paragraph token" is mapped to a vector that is called "paragraph vector". It is different from the token "on", and different from the word vector that the token "on" is mapped to.

114

answered Oct 27 '22 22:10

Franck Dernoncourt

Related questions
                            
                                Medical information extraction using Python
                            
                                Text similarity algorithm
                            
                                Extract triplet subject, predicate, and object sentence
                            
                                How do I generate random text in NLTK 3.0?
                            
                                Strategies for recognizing proper nouns in NLP
                            
                                Existing API for NLP in C++?
                            
                                How do I use python interface of Stanford NER(named entity recogniser)?
                            
                                How to get all words from spacy vocab?
                            
                                Difference between feature selection, feature extraction, feature weights
                            
                                Extract Word from Synset using Wordnet in NLTK 3.0
                            
                                How does spacy lemmatizer works?
                            
                                Computer AI algorithm to write sentences?
                            
                                Hierarchical Dirichlet Process Gensim topic number independent of corpus size
                            
                                Spacy, Strange similarity between two sentences
                            
                                FreqDist in NLTK not sorting output
                            
                                Running .exe on Azure
                            
                                List of Natural Language Processing Tools in Regards to Sentiment Analysis - Which one do you recommend [closed]
                            
                                A Viable Solution for Word Splitting Khmer?
                            
                                How to use vector representation of words (as obtained from Word2Vec,etc) as features for a classifier?
                            
                                Computational Complexity of Self-Attention in the Transformer Model

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does gensim calculate doc2vec paragraph vectors

Tags:

vectorization

nlp

gensim

word2vec

doc2vec

jxn

People also ask

1 Answers

Franck Dernoncourt

Recent Activity

Donate For Us