What is an Embedding in Keras?

Tags:

keras

People also ask

What is embeddings in Tensorflow?

An embedding is a dense vector of floating point values (the length of the vector is a parameter you specify). Instead of specifying the values for the embedding manually, they are trainable parameters (weights learned by the model during training, in the same way a model learns weights for a dense layer).

What is an embedding in deep learning?

An embedding is a relatively low-dimensional space into which you can translate high-dimensional vectors. Embeddings make it easier to do machine learning on large inputs like sparse vectors representing words.

What is word embedding in LSTM?

An LSTM network is a type of recurrent neural network (RNN) that can learn long-term dependencies between time steps of sequence data. A word embedding layer maps a sequence of word indices to embedding vectors and learns the word embedding during training. This layer requires Deep Learning Toolbox™.

As far as I know, the Embedding layer is a simple matrix multiplication that transforms words into their corresponding word embeddings.

The weights of the Embedding layer are of the shape (vocabulary_size, embedding_dimension). For each training sample, its input are integers, which represent certain words. The integers are in the range of the vocabulary size. The Embedding layer transforms each integer i into the ith line of the embedding weights matrix.

In order to quickly do this as a matrix multiplication, the input integers are not stored as a list of integers but as a one-hot matrix. Therefore the input shape is (nb_words, vocabulary_size) with one non-zero value per line. If you multiply this by the embedding weights, you get the output in the shape

(nb_words, vocab_size) x (vocab_size, embedding_dim) = (nb_words, embedding_dim)

So with a simple matrix multiplication you transform all the words in a sample into the corresponding word embeddings.

The Keras Embedding layer is not performing any matrix multiplication but it only:

1. creates a weight matrix of (vocabulary_size)x(embedding_dimension) dimensions

2. indexes this weight matrix

It is always useful to have a look at the source code to understand what a class does. In this case, we will have a look at the class Embedding which inherits from the base layer class called Layer.

(1) - Creating a weight matrix of (vocabulary_size)x(embedding_dimension) dimensions:

This is occuring at the build function of Embedding:

def build(self, input_shape):
    self.embeddings = self.add_weight(
        shape=(self.input_dim, self.output_dim),
        initializer=self.embeddings_initializer,
        name='embeddings',
        regularizer=self.embeddings_regularizer,
        constraint=self.embeddings_constraint,
        dtype=self.dtype)
    self.built = True

If you have a look at the base class Layer you will see that the function add_weight above simply creates a matrix of trainable weights (in this case of (vocabulary_size)x(embedding_dimension) dimensions):

def add_weight(self,
               name,
               shape,
               dtype=None,
               initializer=None,
               regularizer=None,
               trainable=True,
               constraint=None):
    """Adds a weight variable to the layer.
    # Arguments
        name: String, the name for the weight variable.
        shape: The shape tuple of the weight.
        dtype: The dtype of the weight.
        initializer: An Initializer instance (callable).
        regularizer: An optional Regularizer instance.
        trainable: A boolean, whether the weight should
            be trained via backprop or not (assuming
            that the layer itself is also trainable).
        constraint: An optional Constraint instance.
    # Returns
        The created weight variable.
    """
    initializer = initializers.get(initializer)
    if dtype is None:
        dtype = K.floatx()
    weight = K.variable(initializer(shape),
                        dtype=dtype,
                        name=name,
                        constraint=constraint)
    if regularizer is not None:
        with K.name_scope('weight_regularizer'):
            self.add_loss(regularizer(weight))
    if trainable:
        self._trainable_weights.append(weight)
    else:
        self._non_trainable_weights.append(weight)
    return weight

(2) - Indexing this weight matrix

This is occuring at the call function of Embedding:

def call(self, inputs):
    if K.dtype(inputs) != 'int32':
        inputs = K.cast(inputs, 'int32')
    out = K.gather(self.embeddings, inputs)
    return out

This functions returns the output of the Embedding layer which is K.gather(self.embeddings, inputs). What tf.keras.backend.gather exactly does is to index the weights matrix self.embeddings (see build function above) according to the inputs which should be lists of positive integers.

These lists can be retrieved for example if you pass your text/words inputs to the one_hot function of Keras which encodes a text into a list of word indexes of size n (this is NOT one hot encoding - see also this example for more info: https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/).

Therefore, that's all. There is no matrix multiplication.

On the contrary, the Keras Embedding layer is only useful because exactly it avoids performing a matrix multiplication and hence it economizes on some computational resources.

Otherwise, you could just use a Keras Dense layer (after you have encoded your input data) to get a matrix of trainable weights (of (vocabulary_size)x(embedding_dimension) dimensions) and then simply do the multiplication to get the output which will be exactly the same with the output of the Embedding layer.

In Keras, the Embedding layer is NOT a simple matrix multiplication layer, but a look-up table layer (see call function below or the original definition).

def call(self, inputs):
    if K.dtype(inputs) != 'int32':
        inputs = K.cast(inputs, 'int32')
    out = K.gather(self.embeddings, inputs)
    return out

What it does is to map each a known integer n in inputs to a trainable feature vector W[n], whose dimension is the so-called embedded feature length.

In simple words (from the functionality point of view), it is a one-hot encoder and fully-connected layer. The layer weights are trainable.

Related questions
                            
                                How does Keras handle multilabel classification?
                            
                                Get class labels from Keras functional model
                            
                                How big should batch size and number of epochs be when fitting a model in Keras?
                            
                                How to stack multiple lstm in keras?
                            
                                What's the difference between a bidirectional LSTM and an LSTM?
                            
                                What is validation data used for in a Keras Sequential model?
                            
                                How to tell Keras stop training based on loss value?
                            
                                Using Keras & Tensorflow with AMD GPU
                            
                                Keras: the difference between LSTM dropout and LSTM recurrent dropout
                            
                                What is the meaning of axis=-1 in keras.argmax?
                            
                                Why plt.imshow() doesn't display the image?
                            
                                How to get reproducible results in keras
                            
                                How to check which version of Keras is installed?
                            
                                Tensorflow - ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float)
                            
                                keras: how to save the training history attribute of the history object
                            
                                Does model.compile() initialize all the weights and biases in Keras (tensorflow backend)?
                            
                                What does Keras Tokenizer method exactly do?
                            
                                Keras: Difference between Kernel and Activity regularizers
                            
                                Keras, how do I predict after I trained a model?
                            
                                What is the role of TimeDistributed layer in Keras?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is an Embedding in Keras?

Tags:

keras

People also ask

Related questions

Recent Activity

Donate For Us