How should I interpret "size" parameter in Doc2Vec function of gensim?

Question

I am using Doc2Vec function of gensim in Python to convert a document to a vector.

An example of usage

model = Doc2Vec(documents, size=100, window=8, min_count=5, workers=4)

How should I interpret the size parameter. I know that if I set size = 100, the length of output vector will be 100, but what does it mean? For instance, if I increase size to 200, what is the difference?

kampta · Accepted Answer

Word2Vec captures distributed representation of a word which essentially means, multiple neurons capture a single concept (concept can be word meaning/sentiment/part of speech etc.), and also a single neuron contributes to multiple concepts.

These concepts are automatically learnt and not pre-defined, hence you can think of them as latent/hidden. Also for the same reason, the word vectors can be used for multiple applications.

More is the size parameter, more will be the capacity of your neural network to represent these concepts, but more data will be required to train these vectors (as they are initialised randomly). In absence of sufficient number of sentences/computing power, its better to keep the size small.

Doc2Vec follows slightly different neural network architecture as compared to Word2Vec, but the meaning of size is analogous.

Saytiras · Answer

The difference is the detail, that the model can capture. Generally, the more dimensions you give Word2Vec, the better the model - up to a certain point.

Normally the size is between 100-300. You always have to consider that more dimensions also mean, that more memory is needed.

How should I interpret "size" parameter in Doc2Vec function of gensim?

Tags:

python

gensim

word2vec

mamatv

2 Answers

kampta

Saytiras

Recent Activity

Donate For Us

How should I interpret "size" parameter in Doc2Vec function of gensim?

Tags:

python

gensim

word2vec

mamatv

2 Answers

kampta

Saytiras

Related questions

Recent Activity

Donate For Us