I was trying to port CRNN model to Keras.
But, I got stuck while connecting output of Conv2D layer to LSTM layer.
Output from CNN layer will have a shape of ( batch_size, 512, 1, width_dash) where first one depends on batch_size, and last one depends on input width of input ( this model can accept variable width input )
For eg: an input with shape [2, 1, 32, 829] was resulting output with shape of (2, 512, 1, 208)
Now, as per Pytorch model, we have to do squeeze(2) followed by permute(2, 0, 1) it will result a tensor with shape [208, 2, 512 ]
I was trying to implement this is Keras, but I was not able to do that because, in Keras we can not alter batch_size dimension in a keras.models.Sequential model
Can someone please guide me how to port above part of this model to Keras?
Current state of ported CNN layer
CNN is combined with RNN to extract the correlation characteristics of different RNN models while RNNs running along the time steps. This new architecture not only has the depth of RNN in the time dimension, but also has the width of the number of temporal data.
CRNN (Convolutional Recurrent Neural Network) model that. feeds every window frame by frame into a recurrent layer and. use the outputs and hidden states of the recurrent units in each. frame for extracting features from the sequential windows.
The Convolutional Recurrent Neural Networks is the combination of two of the most prominent neural networks. The CRNN (convolutional recurrent neural network) involves CNN(convolutional neural network) followed by the RNN(Recurrent neural networks).
Abstract: We introduce a convolutional recurrent neural network (CRNN) for music tagging. CRNNs take advantage of convolutional neural networks (CNNs) for local feature extraction and recurrent neural networks for temporal summarisation of the extracted features.
You don't need to permute the batch axis in Keras. In a pytorch model you need to do it because a pytorch LSTM expects an input shape (seq_len, batch, input_size)
. However in Keras, the LSTM
layer expects (batch, seq_len, input_size)
.
So after defining the CNN and squeezing out axis 2, you just need to permute the last two axes. As a simple example (in 'channels_first'
Keras image format),
model = Sequential()
model.add(Conv2D(512, 3, strides=(32, 4), padding='same', input_shape=(1, 32, None)))
model.add(Reshape((512, -1)))
model.add(Permute((2, 1)))
model.add(LSTM(32))
You can verify the shapes with model.summary()
:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_4 (Conv2D) (None, 512, 1, None) 5120
_________________________________________________________________
reshape_3 (Reshape) (None, 512, None) 0
_________________________________________________________________
permute_4 (Permute) (None, None, 512) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 32) 69760
=================================================================
Total params: 74,880
Trainable params: 74,880
Non-trainable params: 0
_________________________________________________________________
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With