Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

1d CNN audio in keras

I want to try to implement the neural network architecture of the attached image: 1DCNN_model

Consider that I've got a dataset X which is (N_signals, 1500, 40) where 40 is the number of features where I want to do the 1d convolution on. My Y is (N_signals, 1500, 2) and I'm working with keras. Every 1d convolution needs to take one feature vector like in this picture:1DCNN_convolution

So it has to take one chunk of the 1500 timesamples, pass it through the 1d convolutional layer (sliding along time-axis) then feed all the output features to the LSTM layer.

I tried to implement the first convolutional part with this code but I'm not sure what it's doing, I can't understand how it can take in one chunk at a time (maybe I need to preprocess my input data before?):

input_shape = (None, 40)
model_input = Input(input_shape, name = 'input')
layer = model_input
convs = []
for i in range(n_chunks):
    conv = Conv1D(filters = 40,
                  kernel_size = 10,
                  padding = 'valid',
                  activation = 'relu')(layer)
    conv = BatchNormalization(axis = 2)(conv)
    pool = MaxPooling1D(40)(conv)
    pool = Dropout(0.3)(pool)
    convs.append(pool)
out = Merge(mode = 'concat')(convs)

conv_model = Model(input = layer, output = out)

Any advice? Thank you very much

like image 232
SilverMatt Avatar asked Feb 04 '18 13:02

SilverMatt


2 Answers

Thank you very much, I modified my code in this way:

input_shape = (1500,40)                             
model_input = Input(shape=input_shape, name='input')                    
layer = model_input                                                     
layer = Conv1D(filters=40,
               kernel_size=10,
               padding='valid',
               activation='relu')(layer)
layer = BatchNormalization(axis=2)(layer)                           
layer = MaxPooling1D(pool_size=40,
                     padding='same')(layer)                             
layer = Dropout(self.params.drop_rate)(layer)                           
layer = LSTM(40, return_sequences=True,
             activation=self.params.lstm_activation)(layer)         
layer = Dropout(self.params.lstm_dropout)(layer)
layer = Dense(40, activation = 'relu')(layer)
layer = BatchNormalization(axis = 2)(layer)                
model_output = TimeDistributed(Dense(2,
                                     activation='sigmoid'))(layer) 

I was actually thinking that maybe I have to permute my axes in order to make maxpooling layer work on my 40 mel feature axis...

like image 187
SilverMatt Avatar answered Oct 05 '22 18:10

SilverMatt


If you want to perform an individual 1D convolution over the 40 feature channels you should add a dimension to your input:

(1500,40,1)

if you perform 1D convolution on a input with shape

(1500,40)

the filters are applied on the time dimension and the pictures you posted indicate that this is not what you want to do.

like image 26
BGraf Avatar answered Oct 05 '22 17:10

BGraf