My question is two-fold, but hopefully not too complicated. And both parts specifically pertain to the Skip-Gram model in Word2Vec:
The first part is about structure: as far as I understand it, the Skip-Gram model is based on one neural network with one input weight matrix W, one hidden layer of size N, and C output weight matrices W' each used to produce one of the C output vectors. Is this correct?
The second part is about the output vectors: as far as I understand it, each output vector is of size V and is a result of a Softmax function. Each output vector node corresponds to the index of a word in the vocabulary, and the value of each node is the probability that the corresponding word occurs at that context location (for a given input word). The target output vectors are not, however, one-hot encoded, even if the training instances are. Is this correct?
The way I imagine it is something along the following lines (made-up example):
Assuming the vocabulary ['quick', 'fox', 'jumped', 'lazy', 'dog'] and a context of C=1, and assuming that for the input word 'jumped' I see the two output vectors looking like this:
[0.2 0.6 0.01 0.1 0.09]
[0.2 0.2 0.01 0.16 0.43]
I would interpret this as 'fox' being the most likely word to show up before 'jumped' (p=0.6), and 'dog' being the most likely to show up after it (p=0.43).
Do I have this right? Or am I completely off? Any help is appreciated.
This is my first answer at SO, so here it goes..
Your understanding in both parts seem to be correct, according to this paper :
http://arxiv.org/abs/1411.2738
The paper explains word2vec in detail and at the same time, keeps it very simple - it's worth a read for a thorough understanding of the neural net architecture used in word2vec.
Referring to the example you mentioned, with C=1 and with a vocabulary of ['quick', 'fox', 'jumped', 'lazy', 'dog']
If the output from the skip-gram is [0.2 0.6 0.01 0.1 0.09], where the correct target word is 'fox' then error is calculated as -
[0 1 0 0 0] - [0.2 0.6 0.01 0.1 0.09] = [-0.2 0.4 -0.01 -0.1 -0.09]
and the weight matrices are updated to minimize this error.
Hope this helps !
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With