Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Word2Vec has a hidden layer?

When I am reading one of papers of Tomas Mikolov: http://arxiv.org/pdf/1301.3781.pdf

I have one concern on the Continuous Bag-of-Words Model section:

The first proposed architecture is similar to the feedforward NNLM, where the non-linear hidden layer is removed and the projection layer is shared for all words (not just the projection matrix); thus, all words get projected into the same position (their vectors are averaged).

I find some people mention that there is a hidden layer in Word2Vec model, but from my understanding, there is only one projection layer in that model. Does this projection layer do the same work as hidden layer?

The another question is that how to project input data into the projection layer?

"the projection layer is shared for all words (not just the projection matrix)", what does that mean?

like image 431
Kun Avatar asked Oct 27 '15 16:10

Kun


People also ask

How many layers is Word2Vec?

Word2Vec is a shallow, two-layer neural networks which is trained to reconstruct linguistic contexts of words.

How many hidden layers are there in a Word2Vec word embedding model?

Word embeddings are created using a neural network with one input layer, one hidden layer and one output layer.

What are the features in Word2Vec?

Word2vec is a two-layer neural net that processes text by “vectorizing” words. Its input is a text corpus and its output is a set of vectors: feature vectors that represent words in that corpus. While Word2vec is not a deep neural network, it turns text into a numerical form that deep neural networks can understand.


1 Answers

From the original paper, section 3.1, it is clear that there is no hidden layer:

"the first proposed architecture is similar to the feedforward NNLM where the non-linear hidden layer is removed and the projection layer is shared for all words".

With respect to your second question (what does sharing the projection layer means), it means that you consider only one single vector, which is the centroid of the vectors of all the words in context. Thus, instead of having n-1 word vectors as input, you consider only one vector. This is why it is called Continuous Bag of Words (because word order is lost within the context of size n-1).

like image 137
Antoine Avatar answered Oct 09 '22 02:10

Antoine