I know this is a subject with a lot of questions but I couldn't find any solution to my problem. I am training a LSTM network on variable-length inputs using a masking layer but it seems that it doesn't have any effect. Input shape (100, 362, 24) with 362 being the maximum sequence lenght, 24 the number of features and 100 the number of samples (divided 75 train / 25 valid). Output shape (100, 362, 1) transformed later to (100, 362 - N, 1). Here is the code for my network: <pre class="prettyprint"><code>from keras import Sequential from keras.layers import Embedding, Masking, LSTM, Lambda import keras.backend as K # O O O # example for N:3 | | | # O O O O O O # | | | | | | # O O O O O O N = 5 y= y[:,N:,:] x_train = x[:75] x_test = x[75:] y_train = y[:75] y_test = y[75:] model = Sequential() model.add(Masking(mask_value=0., input_shape=(timesteps, features))) model.add(LSTM(128, return_sequences=True)) model.add(LSTM(64, return_sequences=True)) model.add(LSTM(1, return_sequences=True)) model.add(Lambda(lambda x: x[:, N:, :])) model.compile('adam', 'mae') print(model.summary()) history = model.fit(x_train, y_train, epochs=3, batch_size=15, validation_data=[x_test, y_test]) </code></pre> my data is padded at the end. example: <pre class="prettyprint"><code>>> x_test[10,350] array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32) </code></pre> The problem is that the mask layer seems to have no effect. I can see it with the loss value being printed during training which is equal to the one without mask I calculate after: <pre class="prettyprint"><code>Layer (type) Output Shape Param # ================================================================= masking_1 (Masking) (None, 362, 24) 0 _________________________________________________________________ lstm_1 (LSTM) (None, 362, 128) 78336 _________________________________________________________________ lstm_2 (LSTM) (None, 362, 64) 49408 _________________________________________________________________ lstm_3 (LSTM) (None, 362, 1) 264 _________________________________________________________________ lambda_1 (Lambda) (None, 357, 1) 0 ================================================================= Total params: 128,008 Trainable params: 128,008 Non-trainable params: 0 _________________________________________________________________ None Train on 75 samples, validate on 25 samples Epoch 1/3 75/75 [==============================] - 8s 113ms/step - loss: 0.1711 - val_loss: 0.1814 Epoch 2/3 75/75 [==============================] - 5s 64ms/step - loss: 0.1591 - val_loss: 0.1307 Epoch 3/3 75/75 [==============================] - 5s 63ms/step - loss: 0.1057 - val_loss: 0.1034 >> from sklearn.metrics import mean_absolute_error >> out = model.predict(x_test, batch_size=1) >> print('wo mask', mean_absolute_error(y_test.ravel(), out.ravel())) >> print('w mask', mean_absolute_error(y_test[~(x_test[:,N:] == 0).all(axis=2)].ravel(), out[~(x_test[:,N:] == 0).all(axis=2)].ravel())) wo mask 0.10343371 w mask 0.16236152 </code></pre> Futhermore, if I use nan value for the masked output values, I can see the nan being propagated during training (loss equals nan). What am I missing to make the masking layer work as expected?

The <code>Lambda</code> layer, by default, does not propagate masks. In other words, the mask tensor computed by the <code>Masking</code> layer is thrown away by the <code>Lambda</code> layer, and thus the <code>Masking</code> layer has no effect on the output loss. If you want the <code>compute_mask</code> method of a <code>Lambda</code> layer to propagate previous mask, you have to provide the <code>mask</code> argument when the layer is created. As can be seen from the source code of <code>Lambda</code> layer, <pre class="prettyprint"><code>def __init__(self, function, output_shape=None, mask=None, arguments=None, **kwargs): # ... if mask is not None: self.supports_masking = True self.mask = mask # ... def compute_mask(self, inputs, mask=None): if callable(self.mask): return self.mask(inputs, mask) return self.mask </code></pre> Because the default value of <code>mask</code> is <code>None</code>, <code>compute_mask</code> returns <code>None</code> and the loss is not masked at all. To fix the problem, since your <code>Lambda</code> layer itself does not introduce any additional masking, the <code>compute_mask</code> method should just return the mask from the previous layer (with appropriate slicing to match the output shape of the layer). <pre class="prettyprint"><code>masking_func = lambda inputs, previous_mask: previous_mask[:, N:] model = Sequential() model.add(Masking(mask_value=0., input_shape=(timesteps, features))) model.add(LSTM(128, return_sequences=True)) model.add(LSTM(64, return_sequences=True)) model.add(LSTM(1, return_sequences=True)) model.add(Lambda(lambda x: x[:, N:, :], mask=masking_func)) </code></pre> Now you should be able to see the correct loss value. <pre class="prettyprint"><code>>> model.evaluate(x_test, y_test, verbose=0) 0.2660679519176483 >> out = model.predict(x_test) >> print('wo mask', mean_absolute_error(y_test.ravel(), out.ravel())) wo mask 0.26519736809498456 >> print('w mask', mean_absolute_error(y_test[~(x_test[:,N:] == 0).all(axis=2)].ravel(), out[~(x_test[:,N:] == 0).all(axis=2)].ravel())) w mask 0.2660679670482195 </code></pre> Using NaN value for padding does not work because masking is done by multiplying the loss tensor with a binary mask (<code>0 * nan</code> is still <code>nan</code>, so the mean value would be <code>nan</code>).

Keras lstm with masking layer for variable-length inputs

Tags:

python

keras

lstm

masking

I know this is a subject with a lot of questions but I couldn't find any solution to my problem.

I am training a LSTM network on variable-length inputs using a masking layer but it seems that it doesn't have any effect.

Input shape (100, 362, 24) with 362 being the maximum sequence lenght, 24 the number of features and 100 the number of samples (divided 75 train / 25 valid).

Output shape (100, 362, 1) transformed later to (100, 362 - N, 1).

Here is the code for my network:

from keras import Sequential
from keras.layers import Embedding, Masking, LSTM, Lambda
import keras.backend as K


#                          O O O
#   example for N:3        | | |
#                    O O O O O O
#                    | | | | | | 
#                    O O O O O O

N = 5
y= y[:,N:,:]

x_train = x[:75]
x_test = x[75:]
y_train = y[:75]
y_test = y[75:]

model = Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(1, return_sequences=True))
model.add(Lambda(lambda x: x[:, N:, :]))

model.compile('adam', 'mae')

print(model.summary())
history = model.fit(x_train, y_train, 
                    epochs=3, 
                    batch_size=15, 
                    validation_data=[x_test, y_test])

my data is padded at the end. example:

>> x_test[10,350]
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
   0., 0., 0., 0., 0., 0., 0.], dtype=float32)

The problem is that the mask layer seems to have no effect. I can see it with the loss value being printed during training which is equal to the one without mask I calculate after:

Layer (type)                 Output Shape              Param #   
=================================================================
masking_1 (Masking)          (None, 362, 24)           0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 362, 128)          78336     
_________________________________________________________________
lstm_2 (LSTM)                (None, 362, 64)           49408     
_________________________________________________________________
lstm_3 (LSTM)                (None, 362, 1)            264       
_________________________________________________________________
lambda_1 (Lambda)            (None, 357, 1)            0         
=================================================================
Total params: 128,008
Trainable params: 128,008
Non-trainable params: 0
_________________________________________________________________
None
Train on 75 samples, validate on 25 samples
Epoch 1/3
75/75 [==============================] - 8s 113ms/step - loss: 0.1711 - val_loss: 0.1814
Epoch 2/3
75/75 [==============================] - 5s 64ms/step - loss: 0.1591 - val_loss: 0.1307
Epoch 3/3
75/75 [==============================] - 5s 63ms/step - loss: 0.1057 - val_loss: 0.1034

>> from sklearn.metrics import mean_absolute_error
>> out = model.predict(x_test, batch_size=1)
>> print('wo mask', mean_absolute_error(y_test.ravel(), out.ravel()))
>> print('w mask', mean_absolute_error(y_test[~(x_test[:,N:] == 0).all(axis=2)].ravel(), out[~(x_test[:,N:] == 0).all(axis=2)].ravel()))
wo mask 0.10343371
w mask 0.16236152

Futhermore, if I use nan value for the masked output values, I can see the nan being propagated during training (loss equals nan).

What am I missing to make the masking layer work as expected?

649

asked Apr 05 '18 11:04

Florian Mutel

1 Answers

The Lambda layer, by default, does not propagate masks. In other words, the mask tensor computed by the Masking layer is thrown away by the Lambda layer, and thus the Masking layer has no effect on the output loss.

If you want the compute_mask method of a Lambda layer to propagate previous mask, you have to provide the mask argument when the layer is created. As can be seen from the source code of Lambda layer,

def __init__(self, function, output_shape=None,
             mask=None, arguments=None, **kwargs):
    # ...
    if mask is not None:
        self.supports_masking = True
    self.mask = mask

# ...

def compute_mask(self, inputs, mask=None):
    if callable(self.mask):
        return self.mask(inputs, mask)
    return self.mask

Because the default value of mask is None, compute_mask returns None and the loss is not masked at all.

To fix the problem, since your Lambda layer itself does not introduce any additional masking, the compute_mask method should just return the mask from the previous layer (with appropriate slicing to match the output shape of the layer).

masking_func = lambda inputs, previous_mask: previous_mask[:, N:]
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(1, return_sequences=True))
model.add(Lambda(lambda x: x[:, N:, :], mask=masking_func))

Now you should be able to see the correct loss value.

>> model.evaluate(x_test, y_test, verbose=0)
0.2660679519176483
>> out = model.predict(x_test)
>> print('wo mask', mean_absolute_error(y_test.ravel(), out.ravel()))
wo mask 0.26519736809498456
>> print('w mask', mean_absolute_error(y_test[~(x_test[:,N:] == 0).all(axis=2)].ravel(), out[~(x_test[:,N:] == 0).all(axis=2)].ravel()))
w mask 0.2660679670482195

Using NaN value for padding does not work because masking is done by multiplying the loss tensor with a binary mask (0 * nan is still nan, so the mean value would be nan).

141

answered Oct 23 '22 05:10

Yu-Yang

Related questions
                            
                                Python data structure for efficient add, remove, and random.choice
                            
                                How to check whether SQLAlchemy session is dirty or not
                            
                                Find all possible sublists of a list
                            
                                Curve Fitting to a time series in the format 'datetime'?
                            
                                PyQt - Implement a QAbstractTableModel for display in QTableView
                            
                                python-jenkins or jenkinsapi for jenkins remote access API in python [closed]
                            
                                Cython: how to make an python object as a property of cython class
                            
                                How to set which version of python sublime text uses
                            
                                Create conda package across many versions
                            
                                From tuples to multiple columns in pandas
                            
                                Why do Ipython cells stop executing?
                            
                                Surprising results with Python timeit: Counter() vs defaultdict() vs dict()
                            
                                How to read CSV file with of data frame with row names in Pandas
                            
                                Finding the "best" combination for a set
                            
                                How do I find 5 minutes gaps in a Pandas dataframe?
                            
                                Matplotlib: responding to click events
                            
                                theano g++ not detected
                            
                                Functions, Callable Objects, and how both are created in Python
                            
                                Declaring a single byte variable in Python
                            
                                Any way to write files DIRECTLY to S3 using boto3?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With