Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Explain with example: how embedding layers in keras works


I don't understand the Embedding layer of Keras. Although there are lots of articles explaining it, I am still confused. For example, the code below isfrom imdb sentiment analysis:

top_words = 5000
max_review_length = 500
embedding_vecor_length = 32    

model = Sequential()
model.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, nb_epoch=3, batch_size=64)

In this code, what exactly is the embedding layer doing? What would be the output of embedding layer? It would be nice if someone could explain it with some examples maybe!

like image 647
user1670773 Avatar asked Aug 12 '17 11:08


People also ask

How does the embedding layer in keras work?

Embedding layer enables us to convert each word into a fixed length vector of defined size. The resultant vector is a dense one with having real values instead of just 0's and 1's. The fixed length of word vectors helps us to represent words in a better way along with reduced dimensions.

What is embedded layer keras?

Keras Embedding Layer. Keras offers an Embedding layer that can be used for neural networks on text data. It requires that the input data be integer encoded, so that each word is represented by a unique integer. This data preparation step can be performed using the Tokenizer API also provided with Keras.

What is embedding layer in Tensorflow?

An embedding is a dense vector of floating point values (the length of the vector is a parameter you specify). Instead of specifying the values for the embedding manually, they are trainable parameters (weights learned by the model during training, in the same way a model learns weights for a dense layer).

What is word embedding example?

Thus by using word embeddings, words that are close in meaning are grouped near to one another in vector space. For example, while representing a word such as frog, the nearest neighbour of a frog would be frogs, toads, Litoria.

Video Answer

1 Answers

Embedding layer creates embedding vectors out of the input words (I myself still don't understand the math) similarly like word2vec or precalculated glove would do.

Before I get to your code, let's make a short example.

texts = ['This is a text','This is not a text']

First we turn these sentences into the vector of integers where each word is a number assigned to the word in the dictionary and order of the vector creates the sequence of the words.

from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences 
from keras.utils import to_categorical

max_review_length = 6 #maximum length of the sentence
embedding_vecor_length = 3
top_words = 10

#num_words is tne number of unique words in the sequence, if there's more top count words are taken
tokenizer = Tokenizer(top_words)
sequences = tokenizer.texts_to_sequences(texts)
word_index = tokenizer.word_index
input_dim = len(word_index) + 1
print('Found %s unique tokens.' % len(word_index))

#max_review_length is the maximum length of the input text so that we can create vector [... 0,0,1,3,50] where 1,3,50 are individual words
data = pad_sequences(sequences, max_review_length)

print('Shape of data tensor:', data.shape)

'This is a text' --> [0 0 1 2 3 4]
'This is not a text' --> [0 1 2 5 3 4]

Now you can input these into the embedding layer

from keras.models import Sequential
from keras.layers import Embedding

model = Sequential()
model.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length,mask_zero=True))
model.compile(optimizer='adam', loss='categorical_crossentropy')
output_array = model.predict(data)

output_array contains array of size (2, 6, 3): 2 input reviews or sentences in my case, 6 is the maximum number of words in each review (max_review_length) and 3 is embedding_vecor_length. E.g.

array([[[-0.01494285, -0.007915  ,  0.01764857],
    [-0.01494285, -0.007915  ,  0.01764857],
    [-0.03019481, -0.02910612,  0.03518577],
    [-0.0046863 ,  0.04763055, -0.02629668],
    [ 0.02297204,  0.02146662,  0.03114786],
    [ 0.01634104,  0.02296363, -0.02348827]],

   [[-0.01494285, -0.007915  ,  0.01764857],
    [-0.03019481, -0.02910612,  0.03518577],
    [-0.0046863 ,  0.04763055, -0.02629668],
    [-0.01736645, -0.03719328,  0.02757809],
    [ 0.02297204,  0.02146662,  0.03114786],
    [ 0.01634104,  0.02296363, -0.02348827]]], dtype=float32)

In your case you have a list of 5000 words, which can create review of maximum 500 words (more will be trimmed) and turn each of these 500 words into vector of size 32.

You can get mapping between the word indexes and embedding vectors by running:


In the case below top_words was 10, so we have mapping of 10 words and you can see that mapping for 0, 1, 2, 3, 4 and 5 is equal to output_array above.

[array([[-0.01494285, -0.007915  ,  0.01764857],
    [-0.03019481, -0.02910612,  0.03518577],
    [-0.0046863 ,  0.04763055, -0.02629668],
    [ 0.02297204,  0.02146662,  0.03114786],
    [ 0.01634104,  0.02296363, -0.02348827],
    [-0.01736645, -0.03719328,  0.02757809],
    [ 0.0100757 , -0.03956784,  0.03794377],
    [-0.02672029, -0.00879055, -0.039394  ],
    [-0.00949502, -0.02805768, -0.04179233],
    [ 0.0180716 ,  0.03622523,  0.02232374]], dtype=float32)]

As mentioned in https://stats.stackexchange.com/questions/270546/how-does-keras-embedding-layer-work these vectors are initiated as random and optimized by the netword optimizers just like any other parameter of the network.

like image 106
Vaasha Avatar answered Sep 28 '22 08:09
