Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I mask a loss function in Keras with the TensorFlow backend?

I am trying to implement a sequence-to-sequence task using LSTM by Keras with the TensorFlow backend. The inputs are English sentences with variable lengths. To construct a dataset with 2-D shape [batch_number, max_sentence_length], I add EOF at the end of the line and pad each sentence with enough placeholders, e.g. #. And then each character in the sentence is transformed into a one-hot vector, so that the dataset has 3-D shape [batch_number, max_sentence_length, character_number]. After LSTM encoder and decoder layers, softmax cross-entropy between output and target is computed.

To eliminate the padding effect in model training, masking could be used on input and loss function. Mask input in Keras can be done by using layers.core.Masking. In TensorFlow, masking on loss function can be done as follows: custom masked loss function in TensorFlow.

However, I don't find a way to realize it in Keras, since a user-defined loss function in Keras only accepts parameters y_true and y_pred. So how to input true sequence_lengths to loss function and mask?

Besides, I find a function _weighted_masked_objective(fn) in \keras\engine\training.py. Its definition is

Adds support for masking and sample-weighting to an objective function.

But it seems that the function can only accept fn(y_true, y_pred). Is there a way to use this function to solve my problem?

To be specific, I modify the example of Yu-Yang.

from keras.models import Model from keras.layers import Input, Masking, LSTM, Dense, RepeatVector, TimeDistributed, Activation import numpy as np from numpy.random import seed as random_seed random_seed(123)  max_sentence_length = 5 character_number = 3 # valid character 'a, b' and placeholder '#'  input_tensor = Input(shape=(max_sentence_length, character_number)) masked_input = Masking(mask_value=0)(input_tensor) encoder_output = LSTM(10, return_sequences=False)(masked_input) repeat_output = RepeatVector(max_sentence_length)(encoder_output) decoder_output = LSTM(10, return_sequences=True)(repeat_output) output = Dense(3, activation='softmax')(decoder_output)  model = Model(input_tensor, output) model.compile(loss='categorical_crossentropy', optimizer='adam') model.summary()  X = np.array([[[0, 0, 0], [0, 0, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]],           [[0, 0, 0], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]]) y_true = np.array([[[0, 0, 1], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 1, 0]], # the batch is ['##abb','#babb'], padding '#'           [[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]])  y_pred = model.predict(X) print('y_pred:', y_pred) print('y_true:', y_true) print('model.evaluate:', model.evaluate(X, y_true)) # See if the loss computed by model.evaluate() is equal to the masked loss import tensorflow as tf logits=tf.constant(y_pred, dtype=tf.float32) target=tf.constant(y_true, dtype=tf.float32) cross_entropy = tf.reduce_mean(-tf.reduce_sum(target * tf.log(logits),axis=2)) losses = -tf.reduce_sum(target * tf.log(logits),axis=2) sequence_lengths=tf.constant([3,4]) mask = tf.reverse(tf.sequence_mask(sequence_lengths,maxlen=max_sentence_length),[0,1]) losses = tf.boolean_mask(losses, mask) masked_loss = tf.reduce_mean(losses) with tf.Session() as sess:     c_e = sess.run(cross_entropy)     m_c_e=sess.run(masked_loss)     print("tf unmasked_loss:", c_e)     print("tf masked_loss:", m_c_e) 

The output in Keras and TensorFlow are compared as follows:

enter image description here

As shown above, masking is disabled after some kinds of layers. So how to mask the loss function in Keras when those layers are added?

like image 674
Shuaaai Avatar asked Nov 01 '17 14:11

Shuaaai


People also ask

What is masking in TensorFlow?

Masking is a way to tell sequence-processing layers that certain timesteps in an input are missing, and thus should be skipped when processing the data.

How do I create a loss function in keras?

Creating custom loss functions in Keras A custom loss function can be created by defining a function that takes the true values and predicted values as required parameters. The function should return an array of losses. The function can then be passed at the compile stage.

Is TensorFlow backend for keras?

At this time, Keras has two backend implementations available: the TensorFlow backend and the Theano backend. TensorFlow is an open-source symbolic tensor manipulation framework developed by Google, Inc.

What is padding in TensorFlow?

paddings is an integer tensor with shape [n, 2] , where n is the rank of tensor . For each dimension D of input , paddings[D, 0] indicates how many values to add before the contents of tensor in that dimension, and paddings[D, 1] indicates how many values to add after the contents of tensor in that dimension.


Video Answer


2 Answers

If there's a mask in your model, it'll be propagated layer-by-layer and eventually applied to the loss. So if you're padding and masking the sequences in a correct way, the loss on the padding placeholders would be ignored.

Some Details:

It's a bit involved to explain the whole process, so I'll just break it down to several steps:

  1. In compile(), the mask is collected by calling compute_mask() and applied to the loss(es) (irrelevant lines are ignored for clarity).
weighted_losses = [_weighted_masked_objective(fn) for fn in loss_functions]  # Prepare output masks. masks = self.compute_mask(self.inputs, mask=None) if masks is None:     masks = [None for _ in self.outputs] if not isinstance(masks, list):     masks = [masks]  # Compute total loss. total_loss = None with K.name_scope('loss'):     for i in range(len(self.outputs)):         y_true = self.targets[i]         y_pred = self.outputs[i]         weighted_loss = weighted_losses[i]         sample_weight = sample_weights[i]         mask = masks[i]         with K.name_scope(self.output_names[i] + '_loss'):             output_loss = weighted_loss(y_true, y_pred,                                         sample_weight, mask) 
  1. Inside Model.compute_mask(), run_internal_graph() is called.
  2. Inside run_internal_graph(), the masks in the model is propagated layer-by-layer from the model's inputs to outputs by calling Layer.compute_mask() for each layer iteratively.

So if you're using a Masking layer in your model, you shouldn't worry about the loss on the padding placeholders. The loss on those entries will be masked out as you've probably already seen inside _weighted_masked_objective().

A Small Example:

max_sentence_length = 5 character_number = 2  input_tensor = Input(shape=(max_sentence_length, character_number)) masked_input = Masking(mask_value=0)(input_tensor) output = LSTM(3, return_sequences=True)(masked_input) model = Model(input_tensor, output) model.compile(loss='mae', optimizer='adam')  X = np.array([[[0, 0], [0, 0], [1, 0], [0, 1], [0, 1]],               [[0, 0], [0, 1], [1, 0], [0, 1], [0, 1]]]) y_true = np.ones((2, max_sentence_length, 3)) y_pred = model.predict(X) print(y_pred) [[[ 0.          0.          0.        ]   [ 0.          0.          0.        ]   [-0.11980877  0.05803877  0.07880752]   [-0.00429189  0.13382857  0.19167568]   [ 0.06817091  0.19093043  0.26219055]]   [[ 0.          0.          0.        ]   [ 0.0651961   0.10283815  0.12413475]   [-0.04420842  0.137494    0.13727818]   [ 0.04479844  0.17440712  0.24715884]   [ 0.11117355  0.21645413  0.30220413]]]  # See if the loss computed by model.evaluate() is equal to the masked loss unmasked_loss = np.abs(1 - y_pred).mean() masked_loss = np.abs(1 - y_pred[y_pred != 0]).mean()  print(model.evaluate(X, y_true)) 0.881977558136  print(masked_loss) 0.881978  print(unmasked_loss) 0.917384 

As can be seen from this example, the loss on the masked part (the zeroes in y_pred) is ignored, and the output of model.evaluate() is equal to masked_loss.


EDIT:

If there's a recurrent layer with return_sequences=False, the mask stop propagates (i.e., the returned mask is None). In RNN.compute_mask():

def compute_mask(self, inputs, mask):     if isinstance(mask, list):         mask = mask[0]     output_mask = mask if self.return_sequences else None     if self.return_state:         state_mask = [None for _ in self.states]         return [output_mask] + state_mask     else:         return output_mask 

In your case, if I understand correctly, you want a mask that's based on y_true, and whenever the value of y_true is [0, 0, 1] (the one-hot encoding of "#") you want the loss to be masked. If so, you need to mask the loss values in a somewhat similar way to Daniel's answer.

The main difference is the final average. The average should be taken over the number of unmasked values, which is just K.sum(mask). And also, y_true can be compared to the one-hot encoded vector [0, 0, 1] directly.

def get_loss(mask_value):     mask_value = K.variable(mask_value)     def masked_categorical_crossentropy(y_true, y_pred):         # find out which timesteps in `y_true` are not the padding character '#'         mask = K.all(K.equal(y_true, mask_value), axis=-1)         mask = 1 - K.cast(mask, K.floatx())          # multiply categorical_crossentropy with the mask         loss = K.categorical_crossentropy(y_true, y_pred) * mask          # take average w.r.t. the number of unmasked entries         return K.sum(loss) / K.sum(mask)     return masked_categorical_crossentropy  masked_categorical_crossentropy = get_loss(np.array([0, 0, 1])) model = Model(input_tensor, output) model.compile(loss=masked_categorical_crossentropy, optimizer='adam') 

The output of the above code then shows that the loss is computed only on the unmasked values:

model.evaluate: 1.08339476585 tf unmasked_loss: 1.08989 tf masked_loss: 1.08339 

The value is different from yours because I've changed the axis argument in tf.reverse from [0,1] to [1].

like image 165
Yu-Yang Avatar answered Sep 28 '22 10:09

Yu-Yang


If you're not using masks as in Yu-Yang's answer, you can try this.

If you have your target data Y with length and padded with the mask value, you can:

import keras.backend as K def custom_loss(yTrue,yPred):      #find which values in yTrue (target) are the mask value     isMask = K.equal(yTrue, maskValue) #true for all mask values      #since y is shaped as (batch, length, features), we need all features to be mask values     isMask = K.all(isMask, axis=-1) #the entire output vector must be true         #this second line is only necessary if the output features are more than 1      #transform to float (0 or 1) and invert     isMask = K.cast(isMask, dtype=K.floatx())     isMask = 1 - isMask #now mask values are zero, and others are 1      #multiply this by the inputs:        #maybe you might need K.expand_dims(isMask) to add the extra dimension removed by K.all      yTrue = yTrue * isMask         yPred = yPred * isMask       return someLossFunction(yTrue,yPred) 

If you have padding only for the input data, or if Y has no length, you can have your own mask outside the function:

masks = [    [1,1,1,1,1,1,0,0,0],    [1,1,1,1,0,0,0,0,0],    [1,1,1,1,1,1,1,1,0] ]  #shape (samples, length). If it fails, make it (samples, length, 1).   import keras.backend as K  masks = K.constant(masks) 

Since masks depend on your input data, you can use your mask value to know where to put zeros, such as:

masks = np.array((X_train == maskValue).all(), dtype='float64')     masks = 1 - masks  #here too, if you have a problem with dimensions in the multiplications below #expand masks dimensions by adding a last dimension = 1. 

And make your function taking masks from outside of it (you must recreate the loss function if you change the input data):

def customLoss(yTrue,yPred):      yTrue = masks*yTrue     yPred = masks*yPred      return someLossFunction(yTrue,yPred) 

Does anyone know if keras automatically masks the loss function?? Since it provides a Masking layer and says nothing about the outputs, maybe it does it automatically?

like image 20
Daniel Möller Avatar answered Sep 28 '22 09:09

Daniel Möller