I need to ask few questions regarding word embeddings.....could be basic.
[0 0 0 1 0]
into an embedded vector E = [0.2, 0.4, 0.2, 0.2]
.... is there any importance for each index in resultant word vector? For instance E[1]
which is 0.2.... what specifically E[1]
defines (although I know its basically a transformation into another space).... or word vector collectively defines context but not individually...Thanks in advance
To visualize the word embedding, we are going to use common dimensionality reduction techniques such as PCA and t-SNE. To map the words into their vector representations in embedding space, the pre-trained word embedding GloVe will be implemented.
One way to check if we have a good word2vec model is to use the model to find the most similar words to a specific word. For that, we can use the most_similar function that returns the 10 most similar words to the given word. Let's find the most similar words to the word blue .
A word embedding is a learned representation for text where words that have the same meaning have a similar representation. It is this approach to representing words and documents that may be considered one of the key breakthroughs of deep learning on challenging natural language processing problems.
Word Embedding Properties This type of relationship between embeddings is very useful for finding relations between words. With vector operations, it is possible to discover words used with similar contexts (“Rome” and “Paris”), solve word analogies and create visualizations of similar words.
1: Each element (or a group of element) in embedding vector have some meaning, but mostly unknown for human. Depend on what algorithm you use, a word embedding vector may have different meaning, but usually useful. For example, Glove, similar word 'frog', 'toad' stay near each other in vector space. King - man result in vector similar to Queen.
Turn vocab into index. For example, you have a vocabulary list: [dog, cat, mouse, feed, play, with] Then the sentences: Dog play with cat => 0, 4, 5, 1 While, you have embedding matrix as follow
[0.1, 0.1, 0] # comment: this is dog
[0.2, 0.5, 0.1] # this is cat
[...]
[...]
[...]
[...]
where first row is embedding vector of dog, second row is cat, then so on Then, you use the index (0, 4, 5, 1) after lookup would become a matrix [[0.1, 0.1, 0][...][...][0.2, 0.5, 0.1]]
Update: One-hot vector does not contain any information. You can think that one-hot vector is index of that vector in vocabulary. For example, Dog => [1, 0, 0, 0, 0, 0] and cat => [0, 1, 0, 0, 0, 0]. There are some different between one-hot vs index:
if you input a list of index: [0, 4, 5, 1] to your multi-layer perceptron, it cannot learn anything (I tried...).But if you input a matrix of one-hot vector [[...1][1...][...][...]], it learn something. But it costly in term of RAM and CPU.
One-hot cost a lot of memory to store zeros. Thus, I suggest randomly init embedding matrix if you don't have one. Store dataset as index, and use index to look up embedding vector
"its mean that lookup table is just a matrix of embedded vectors (already been trained seperately via word2vec or...) for each word in the vocabulary. and while in the process of neural network either we can use an Embedding Layer or we can just refer to embedded vector in lookup table for that particular embedded vector against particular one-hot vector."
Use the "INDEX" to look-up in lookup table. Turn dog into 0, cat into 1. One-hot vector and index contain same information, but one-hot cost more memory to store. Moreover, a lot of deeplearning framework accept index as input to embedding layer (which, output is a vector represent for a word in that index.)
". How we get this embedding vector..."
=> read paper. Here is paper about Word2vec and Glove. Ask your lecturers for more detail, they are willing to help you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With