I would like to know more details about the merge mode when using Bidirectional LSTM for sequence classification, and especially for the "Concat" merge mode which is still quite unclear to me.
From what I understood with this scheme:
The output y_t is computed after passing the merged results of the forward and backward layers into the sigmoid function. It seems rather intuitive for the "add","mul" and "average" merge modes but I don't understand how the output y_t is computed when 'concat' merge mode is chosen. Indeed, with this merge mode we now have a vector instead of a single value before the sidmoid function.
n X t X f
where
n
:Batch sizet
sequence length/time-steps/no:of unrollings)f
:No:Of feature per time-stepBi-LSTM
defined as belowmodel.add(Bidirectional(LSTM(10, return_sequences=True), input_shape=(t, f)))
n X t X 10
and LSTM2 will return output of size n X t X 10
merge_mode
sum: Add LSTM1 output to LSTM2 at each timestep. ie. n X t X 10
of LSTM1 + n X t X 10
of LSTM2 = output of size n X t X 10
mul: Element wise multiplication of LSTM1 output to LSTM2 at each timestep which will result in output of size n X t X 10
concat: Element wise concatenation of LSTM1 output to LSTM2 at each timestep which will result in output of size n X t X 10*2
ave: Element wise average of LSTM1 output to LSTM2 at each timestep which will result in output of size n X t X 10
None: Return LSTM1 and LSTM2 outputs as list
No activation function is applied after combining the outputs based on merge_mode
. If you want to apply an activation you will have to explicitily define so in the model as a layer.
model = Sequential()
model.add(Bidirectional(LSTM(10, return_sequences=True), input_shape=(5, 15), merge_mode='concat'))
assert model.layers[-1].output_shape == (None, 5, 20)
model = Sequential()
model.add(Bidirectional(LSTM(10, return_sequences=True), input_shape=(5, 15), merge_mode='sum'))
assert model.layers[-1].output_shape == (None, 5, 10)
model = Sequential()
model.add(Bidirectional(LSTM(10, return_sequences=True), input_shape=(5, 15), merge_mode='mul'))
assert model.layers[-1].output_shape == (None, 5, 10)
You cannot use merge_mode=None
inside a sequence model because each layer should return a tensor but None
returns a list so you can't stack it up in a model. However you can use it inside functional API of keras.
It is quite simple. Imagine that your forward LSTM layer returned a state like [0.1, 0.2, 0.3]
and backward LSTM layer yielded [0.4, 0.5, 0.6]
. Then concatenation (concat for brevity) is [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]
, which is passed further to activation layer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With