Keras TimeDistributed Not Masking CNN Model

Question

For the sake of example, I have an input consisting of 2 images,of total shape (2,299,299,3). I'm trying to apply inceptionv3 on each image, and then subsequently process the output with an LSTM. I'm using a masking layer to exclude a blank image from being processed (specified below).

The code is:

import numpy as np
from keras import backend as K
from keras.models import Sequential,Model
from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D, BatchNormalization, \
Input, GlobalAveragePooling2D, Masking,TimeDistributed, LSTM,Dense,Flatten,Reshape,Lambda, Concatenate
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.applications import inception_v3

IMG_SIZE=(299,299,3)
def create_base():
    base_model = inception_v3.InceptionV3(weights='imagenet', include_top=False)
    x = GlobalAveragePooling2D()(base_model.output)
    base_model=Model(base_model.input,x)
    return base_model


base_model=create_base()

#Image mask to ignore images with pixel values of -1
IMAGE_MASK = -2*np.expand_dims(np.ones(IMG_SIZE),0)

final_input=Input((2,IMG_SIZE[0],IMG_SIZE[1],IMG_SIZE[2]))

final_model = Masking(mask_value = -2.)(final_input)
final_model = TimeDistributed(base_model)(final_model)
final_model = Lambda(lambda x: x, output_shape=lambda s:s)(final_model)
#final_model = Reshape(target_shape=(2, 2048))(final_model)
#final_model = Masking(mask_value = 0.)(final_model)
final_model = LSTM(5,return_sequences=False)(final_model)
final_model = Model(final_input,final_model)


#Create a sample test image
TEST_IMAGE = np.ones(IMG_SIZE)

#Create a test sample input, consisting of a normal image and a masked image
TEST_SAMPLE = np.concatenate((np.expand_dims(TEST_IMAGE,axis=0),IMAGE_MASK))



inp = final_model.input                                           # input placeholder
outputs = [layer.output for layer in final_model.layers]          # all layer outputs
functors = [K.function([inp]+ [K.learning_phase()], [out]) for out in outputs]
layer_outs = [func([np.expand_dims(TEST_SAMPLE,0), 1.]) for func in functors]

This does not work correctly. Specifically, the model should mask the IMAGE_MASK part of the input, but it instead processes it with inception (giving a nonzero output). here are the details:

layer_out[-1] , the LSTM output is fine:

[array([[-0.15324114, -0.09620268, -0.01668587, 0.07938149, -0.00757846]], dtype=float32)]

layer_out[-2] and layer_out[-3] , the LSTM input is wrong, it should have all zeros in the second array:

[array([[[ 0.37713543, 0.36381325, 0.36197218, ..., 0.23298527, 0.43247852, 0.34844452], [ 0.24972123, 0.2378867 , 0.11810347, ..., 0.51930511, 0.33289322, 0.33403745]]], dtype=float32)]

layer_out[-4], the input to the CNN is correctly masked:

[[ 1.,  1.,  1.],
           [ 1.,  1.,  1.],
           [ 1.,  1.,  1.],
           ..., 
           [ 1.,  1.,  1.],
           [ 1.,  1.,  1.],
           [ 1.,  1.,  1.]]],


         [[[-0., -0., -0.],
           [-0., -0., -0.],
           [-0., -0., -0.],
           ..., 
           [-0., -0., -0.],
           [-0., -0., -0.],
           [-0., -0., -0.]],

Note that the code seems to work correctly with a simpler base_model such as:

def create_base():
    input_layer=Input(IMG_SIZE)
    base_model=Flatten()(input_layer)
    base_model=Dense(2048)(base_model)
    base_model=Model(input_layer,base_model)
    return base_model

I have exhausted most online resources on this. Permutations of this question have been asked on Keras's github, such as here, here and here, but I can't seem to find any concrete resolution.

The links suggest that the issues seem to be stemming from a combination of TimeDistributed being applied to BatchNormalization, and the hacky fixes of either the Lambda identity layer, or Reshape layers remove errors but don't seem to output the correct model.

I've tried to force the base model to support masking via:

base_model.__setattr__('supports_masking',True)

and I've also tried applying an identity layer via:

TimeDistributed(Lambda(lambda x: base_model(x), output_shape=lambda s:s))(final_model)

but none of these seem to work. Note that I would like the final model to be trainable, in particular the CNN part of it should remain trainable.

nuric · Accepted Answer

It seems to be working as intended. Masking in Keras doesn't produce zeros as you would expect, it instead skips the timesteps that are masked in upstream layers such as LSTM and loss calculation. In case of RNNs, Keras (at least tensorflow) is implemented such that the states from the previous step are carried over, tensorflow_backend.py. This is done in part to preserve the shapes of tensors when dynamic input is given.

If you really want zeros you will have to implement your own layer with a similar logic to Masking and return zeros explicitly. To solve your problem, you need a mask before the final LSTM layer using the final_input:

class MyMask(Masking):
   """Layer that adds a mask based on initial input."""
   def compute_mask(self, inputs, mask=None):
      # Might need to adjust shapes
      return K.any(K.not_equal(inputs[0], self.mask_value), axis=-1)

   def call(self, inputs):
      # We just return input back
      return inputs[1]

   def compute_output_shape(self, input_shape):
     return input_shape[1]
final_model = MyMask(mask_value=-2.)([final_input, final_model])

You probably can attach the mask in a simpler manner but this custom class essentially adds a mask based on your initial inputs and outputs a Keras tensor that now has a mask.

Your LSTM will ignore in your example the second image. To confirm you can return_sequences=Trueand check that the output for 2 images are identical.

Daniel Möller · Answer

Not entirely sure this will work, but based on the comment made here, with a newer version of tensorflow + keras it should work:

final_model = TimeDistributed(Flatten())(final_input)
final_model = Masking(mask_value = -2.)(final_model)
final_model = TimeDistributed(Reshape(IMG_SIZE))(final_model)
final_model = TimeDistributed(base_model)(final_model)
final_model = Model(final_input,final_model)

I took a look at the source code of masking, and I noticed Keras creates a mask tensor that only reduces the last axis. As long as you're dealing with 5D tensors, it will cause no problem, but when you reduce the dimensions for the LSTM, this masking tensor becomes incompatible.

Doing the first flatten step, before masking, will assure that the masking tensor works properly for 3D tensors. Then you expand the image again to its original size.

I'll probably try to install newer versions soon to test it myself, but these installing procedures have caused too much trouble and I'm in the middle of something important here.

On my machine, this code compiles, but that strange error appears in prediction time (see link at the first line of this answer).

Creating a model for predicting the intermediate layers

I'm not sure, by the code I've seen, that the masking function is kept internally in tensors. I don't know exactly how it works, but it seems to be managed separately from the building of the functions inside the layers.

So, try using a keras standard model to make the predictions:

inp = final_model.input                                           # input placeholder
outputs = [layer.output for layer in final_model.layers]          # all layer outputs

fullModel = Model(inp,outputs)
layerPredictions = fullModel.predict(np.expand_dims(TEST_SAMPLE,0))

print(layerPredictions[-2])

Keras TimeDistributed Not Masking CNN Model

Tags:

tensorflow

deep-learning

keras

conv-neural-network

Alex R.

2 Answers

nuric

Creating a model for predicting the intermediate layers

Daniel Möller

Recent Activity

Donate For Us

Keras TimeDistributed Not Masking CNN Model

Tags:

tensorflow

deep-learning

keras

conv-neural-network

Alex R.

2 Answers

nuric

Creating a model for predicting the intermediate layers

Daniel Möller

Related questions

Recent Activity

Donate For Us