In my model, I use GloVe pre-trained embeddings. I wish to keep them non-trainable in order to decrease the number of model parameters and avoid overfit. However, I have a special symbol whose embedding I do want to train.
Using the provided Embedding Layer, I can only use the parameter 'trainable' to set the trainability of all embeddings in the following way:
embedding_layer = Embedding(voc_size,
emb_dim,
weights=[embedding_matrix],
input_length=MAX_LEN,
trainable=False)
Is there a Keras-level solution to training only a subset of embeddings?
Please note:
Most of the time when you use embeddings, you'll use them already trained and available - you won't be training them yourself. However, to understand what they are better, we'll mock up a dataset based on colour combinations, and learn the embeddings to turn a colour name into a location in both 2D and 3D space.
We load this embedding matrix into an Embedding layer. Note that we set trainable=False to prevent the weights from being updated during training. An Embedding layer should be fed sequences of integers, i.e. a 2D input of shape (samples, indices) .
Train the model First, convert our list-of-strings data to NumPy arrays of integer indices. The arrays are right-padded. We use categorical crossentropy as our loss since we're doing softmax classification. Moreover, we use sparse_categorical_crossentropy since our labels are integers.
If we're in a hurry, one rule of thumb is to use the fourth root of the total number of unique categorical elements while another is that the embedding dimension should be approximately 1.6 times the square root of the number of unique elements in the category, and no less than 600.
Found some nice workaround, inspired by Keith's two embeddings layers.
Main idea:
Assign the special tokens (and the OOV) with the highest IDs. Generate a 'sentence' containing only special tokens, 0-padded elsewhere. Then apply non-trainable embeddings to the 'normal' sentence, and trainable embeddings to the special tokens. Lastly, add both.
Works fine to me.
# Normal embs - '+2' for empty token and OOV token
embedding_matrix = np.zeros((vocab_len + 2, emb_dim))
# Special embs
special_embedding_matrix = np.zeros((special_tokens_len + 2, emb_dim))
# Here we may apply pre-trained embeddings to embedding_matrix
embedding_layer = Embedding(vocab_len + 2,
emb_dim,
mask_zero = True,
weights = [embedding_matrix],
input_length = MAX_SENT_LEN,
trainable = False)
special_embedding_layer = Embedding(special_tokens_len + 2,
emb_dim,
mask_zero = True,
weights = [special_embedding_matrix],
input_length = MAX_SENT_LEN,
trainable = True)
valid_words = vocab_len - special_tokens_len
sentence_input = Input(shape=(MAX_SENT_LEN,), dtype='int32')
# Create a vector of special tokens, e.g: [0,0,1,0,3,0,0]
special_tokens_input = Lambda(lambda x: x - valid_words)(sentence_input)
special_tokens_input = Activation('relu')(special_tokens_input)
# Apply both 'normal' embeddings and special token embeddings
embedded_sequences = embedding_layer(sentence_input)
embedded_special = special_embedding_layer(special_tokens_input)
# Add the matrices
embedded_sequences = Add()([embedded_sequences, embedded_special])
I haven't found a nice solution like a mask for the Embedding layer. But here's what I've been meaning to try:
That would get you a solution with a small number of free parameters allocated to those embeddings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With