Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras embedding layer masking. Why does input_dim need to be |vocabulary| + 2?

In the Keras docs for Embedding https://keras.io/layers/embeddings/, the explanation given for mask_zero is

mask_zero: Whether or not the input value 0 is a special "padding" value that should be masked out. This is useful when using recurrent layers which may take variable length input. If this is True then all subsequent layers in the model need to support masking or an exception will be raised. If mask_zero is set to True, as a consequence, index 0 cannot be used in the vocabulary (input_dim should equal |vocabulary| + 2).

Why does input_dim need to be 2 + number of words in vocabulary? Assuming 0 is masked and can't be used, shouldn't it just be 1 + number of words? What is the other extra entry for?

like image 801
Nigel Ng Avatar asked Apr 05 '17 10:04

Nigel Ng


People also ask

What is Input_dim in embedding layer?

input_dim: Integer. Size of the vocabulary, i.e. maximum integer index + 1. output_dim: Integer. Dimension of the dense embedding.

What is the purpose of embedding layer and dense layer What does their size parameter help in the model?

Embedding layer enables us to convert each word into a fixed length vector of defined size. The resultant vector is a dense one with having real values instead of just 0's and 1's. The fixed length of word vectors helps us to represent words in a better way along with reduced dimensions.

What is mask zero in embedding?

Actually, setting mask_zero=True for the Embedding layer does not result in returning a zero vector. Rather, the behavior of the Embedding layer would not change and it would return the embedding vector with index zero.

What is the purpose of embedding layer and dense layer?

A Dense layer will treat these like actual weights with which to perform matrix multiplication. An embedding layer will simply treat these weights as a list of vectors, each vector representing one word; the 0th word in the vocabulary is w[0] , 1st is w[1] , etc.


1 Answers

I believe the docs are a bit misleading there. In the normal case you are mapping your n input data indices [0, 1, 2, ..., n-1] to vectors, so your input_dim should be as many elements as you have

input_dim = len(vocabulary_indices)

An equivalent (but slightly confusing) way to say this, and the way the docs do, is to say

1 + maximum integer index occurring in the input data.

input_dim = max(vocabulary_indices) + 1

If you enable masking, value 0 is treated differently, so you increment your n indices by one: [0, 1, 2, ..., n-1, n], thus you need

input_dim = len(vocabulary_indices) + 1

or alternatively

input_dim = max(vocabulary_indices) + 2

The docs become especially confusing here as they say

(input_dim should equal |vocabulary| + 2)

where I would interpret |x| as the cardinality of a set (equivalent to len(x)), but the authors seem to mean

2 + maximum integer index occurring in the input data.

like image 188
Nils Werner Avatar answered Oct 10 '22 05:10

Nils Werner