Word Embedding, LookupTable, Word Embedding Visualizations

Tags:

I need to ask few questions regarding word embeddings.....could be basic.

When we convert a one-hot vector of a word for instance king [0 0 0 1 0] into an embedded vector E = [0.2, 0.4, 0.2, 0.2].... is there any importance for each index in resultant word vector? For instance E[1] which is 0.2.... what specifically E[1] defines (although I know its basically a transformation into another space).... or word vector collectively defines context but not individually...
How the dimension (reduced or increased) of a word vector matters as compared to the original one-hot vector ?
How can we define lookup table in term of embedding layer?
is lookup table a kind of random generated table or it already been trained separately with respect to data instance in data and we just use it later on in Neural Network operations? 5- Is there any method to visualize an embedded vector at Hidden Layer (as we do have in Image based Neural Network Processing)?

Thanks in advance

574

asked Jul 03 '17 09:07

Zaheer Babar

1 Answers

1: Each element (or a group of element) in embedding vector have some meaning, but mostly unknown for human. Depend on what algorithm you use, a word embedding vector may have different meaning, but usually useful. For example, Glove, similar word 'frog', 'toad' stay near each other in vector space. King - man result in vector similar to Queen.

Turn vocab into index. For example, you have a vocabulary list: [dog, cat, mouse, feed, play, with] Then the sentences: Dog play with cat => 0, 4, 5, 1 While, you have embedding matrix as follow

[0.1, 0.1, 0] # comment: this is dog
[0.2, 0.5, 0.1] # this is cat
[...]
[...]
[...]
[...]

where first row is embedding vector of dog, second row is cat, then so on Then, you use the index (0, 4, 5, 1) after lookup would become a matrix [[0.1, 0.1, 0][...][...][0.2, 0.5, 0.1]]

either or both
- You can randomly init embedding vector and training it with gradient descent
- You can take pretrained word vector and keep it fixed (i.e: read-only, no change). You can train your word vector in model and use it in another model. Our you can download pretrained word vector online. Example Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors, 2.03 GB download): glove.840B.300d.zip on Glove
- You can init with pretrained word vector and train with your model by gradient descent

Update: One-hot vector does not contain any information. You can think that one-hot vector is index of that vector in vocabulary. For example, Dog => [1, 0, 0, 0, 0, 0] and cat => [0, 1, 0, 0, 0, 0]. There are some different between one-hot vs index:

if you input a list of index: [0, 4, 5, 1] to your multi-layer perceptron, it cannot learn anything (I tried...).But if you input a matrix of one-hot vector [[...1][1...][...][...]], it learn something. But it costly in term of RAM and CPU.
One-hot cost a lot of memory to store zeros. Thus, I suggest randomly init embedding matrix if you don't have one. Store dataset as index, and use index to look up embedding vector

"its mean that lookup table is just a matrix of embedded vectors (already been trained seperately via word2vec or...) for each word in the vocabulary. and while in the process of neural network either we can use an Embedding Layer or we can just refer to embedded vector in lookup table for that particular embedded vector against particular one-hot vector."

Use the "INDEX" to look-up in lookup table. Turn dog into 0, cat into 1. One-hot vector and index contain same information, but one-hot cost more memory to store. Moreover, a lot of deeplearning framework accept index as input to embedding layer (which, output is a vector represent for a word in that index.)

". How we get this embedding vector..."

=> read paper. Here is paper about Word2vec and Glove. Ask your lecturers for more detail, they are willing to help you.

146

answered Oct 05 '22 05:10

Haha TTpro

Related questions
                            
                                Batch training uses sum of updates? or average of updates?
                            
                                Advantage of using experiments in TensorFlow
                            
                                1d CNN audio in keras
                            
                                Keras MSE definition
                            
                                AttributeError: 'str' object has no attribute 'ndim' [closed]
                            
                                Neural Networks - Difference between deep autoencoder and stacked autoencoder [closed]
                            
                                Understanding Tensorflow BasicLSTMCell Kernel and Bias shape
                            
                                How downsample work in ResNet in pytorch code?
                            
                                How does Tensorflow calculate the accuracy of model?
                            
                                Stuck understanding ResNet's Identity block and Convolutional blocks
                            
                                I keep getting an Assertion Error with StyleGAN
                            
                                Why do Mel-filterbank energies outperform MFCCs for speech commands recognition using CNN?
                            
                                Shuffle patches in image batch
                            
                                Use of matrix multiplication function in TensorFlow
                            
                                Caffe does not make snapshots on SIGINT
                            
                                How to stack multiple layers of conv2d_transpose() of Tensorflow
                            
                                TensorFlow - How to predict with trained model on a different test dataset?
                            
                                TensorFlow: Incompatible shapes: [100,155] vs. [128,155] when combining CNN and LSTM
                            
                                Keras custom metric iteration
                            
                                How to evaluate a pretrained model in Tensorflow object detection api

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Word Embedding, LookupTable, Word Embedding Visualizations

Tags:

deep-learning

text-mining

word-embedding

word2vec

Zaheer Babar

People also ask

1 Answers

Haha TTpro

Recent Activity

Donate For Us