CNN: initializing unknown words from word2vec

Question

I came across these slides, presentation from Kim about CNN's using word2vec: http://www.people.fas.harvard.edu/~yoonkim/data/Kim_EMNLP_2014_slides.pdf

On slide 20, the fourth bullet point reads:

Words not in word2vec are initialized randomly from U[−a, a] 
where a is chosen such that the unknown words have the
same variance as words already in word2vec.

Now I am wondering how "a" is being computed and also how the entire vector for the entirely unknown word is computed.

Salvador Medina · Accepted Answer

According to an answer by Mikolov himself, you can initialize the vector based on the space described by the infrequent words. In his answer he mentions that you should average the infrequent words and in that way build the unknown token.

Following up this idea, I think that a refers to the radius of the infrequent words space. What you could do is get the centroid C of the infrequent words (through a mean), calculate the diameter 2*a of the infrequent vector space Q, and generate a random vector u through uniformly distributed samples located within Q.

CNN: initializing unknown words from word2vec

Tags:

deep-learning

convolution

word2vec

Thomas Kern

1 Answers

Salvador Medina

Recent Activity

Donate For Us

CNN: initializing unknown words from word2vec

Tags:

deep-learning

convolution

word2vec

Thomas Kern

1 Answers

Salvador Medina

Related questions

Recent Activity

Donate For Us