1d CNN audio in keras

Question

I want to try to implement the neural network architecture of the attached image: 1DCNN_model

Consider that I've got a dataset X which is (N_signals, 1500, 40) where 40 is the number of features where I want to do the 1d convolution on. My Y is (N_signals, 1500, 2) and I'm working with keras. Every 1d convolution needs to take one feature vector like in this picture:1DCNN_convolution

So it has to take one chunk of the 1500 timesamples, pass it through the 1d convolutional layer (sliding along time-axis) then feed all the output features to the LSTM layer.

I tried to implement the first convolutional part with this code but I'm not sure what it's doing, I can't understand how it can take in one chunk at a time (maybe I need to preprocess my input data before?):

input_shape = (None, 40)
model_input = Input(input_shape, name = 'input')
layer = model_input
convs = []
for i in range(n_chunks):
    conv = Conv1D(filters = 40,
                  kernel_size = 10,
                  padding = 'valid',
                  activation = 'relu')(layer)
    conv = BatchNormalization(axis = 2)(conv)
    pool = MaxPooling1D(40)(conv)
    pool = Dropout(0.3)(pool)
    convs.append(pool)
out = Merge(mode = 'concat')(convs)

conv_model = Model(input = layer, output = out)

Any advice? Thank you very much

SilverMatt · Accepted Answer

Thank you very much, I modified my code in this way:

input_shape = (1500,40)                             
model_input = Input(shape=input_shape, name='input')                    
layer = model_input                                                     
layer = Conv1D(filters=40,
               kernel_size=10,
               padding='valid',
               activation='relu')(layer)
layer = BatchNormalization(axis=2)(layer)                           
layer = MaxPooling1D(pool_size=40,
                     padding='same')(layer)                             
layer = Dropout(self.params.drop_rate)(layer)                           
layer = LSTM(40, return_sequences=True,
             activation=self.params.lstm_activation)(layer)         
layer = Dropout(self.params.lstm_dropout)(layer)
layer = Dense(40, activation = 'relu')(layer)
layer = BatchNormalization(axis = 2)(layer)                
model_output = TimeDistributed(Dense(2,
                                     activation='sigmoid'))(layer)

I was actually thinking that maybe I have to permute my axes in order to make maxpooling layer work on my 40 mel feature axis...

BGraf · Answer

If you want to perform an individual 1D convolution over the 40 feature channels you should add a dimension to your input:

(1500,40,1)

if you perform 1D convolution on a input with shape

(1500,40)

the filters are applied on the time dimension and the pictures you posted indicate that this is not what you want to do.

1d CNN audio in keras

Tags:

python

machine-learning

neural-network

deep-learning

keras

SilverMatt

2 Answers

SilverMatt

BGraf

Recent Activity

Donate For Us

1d CNN audio in keras

Tags:

python

machine-learning

neural-network

deep-learning

keras

SilverMatt

2 Answers

SilverMatt

BGraf

Related questions

Recent Activity

Donate For Us