Keras conv1d layer parameters: filters and kernel_size

Tags:

convolution

I am very confused by these two parameters in the conv1d layer from keras: https://keras.io/layers/convolutional/#conv1d

the documentation says:

filters: Integer, the dimensionality of the output space (i.e. the number output of filters in the convolution). kernel_size: An integer or tuple/list of a single integer, specifying the length of the 1D convolution window.

But that does not seem to relate to the standard terminologies I see on many tutorials such as https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks/ and https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/

Using the second tutorial link which uses Keras, I'd imagine that in fact 'kernel_size' is relevant to the conventional 'filter' concept which defines the sliding window on the input feature space. But what about the 'filter' parameter in conv1d? What does it do?

For example, in the following code snippet:

model.add(embedding_layer) model.add(Dropout(0.2)) model.add(Conv1D(filters=100, kernel_size=4, padding='same', activation='relu'))

suppose the embedding layer outputs a matrix of dimension 50 (rows, each row is a word in a sentence) x 300 (columns, the word vector dimension), how does the conv1d layer transforms that matrix?

Many thanks

688

asked Sep 30 '17 14:09

Ziqi

Video Answer

1 Answers

You're right to say that kernel_size defines the size of the sliding window.

The filters parameters is just how many different windows you will have. (All of them with the same length, which is kernel_size). How many different results or channels you want to produce.

When you use filters=100 and kernel_size=4, you are creating 100 different filters, each of them with length 4. The result will bring 100 different convolutions.

Also, each filter has enough parameters to consider all input channels.

The Conv1D layer expects these dimensions:

(batchSize, length, channels)

I suppose the best way to use it is to have the number of words in the length dimension (as if the words in order formed a sentence), and the channels be the output dimension of the embedding (numbers that define one word).

So:

batchSize = number of sentences     length = number of words in each sentence    channels = dimension of the embedding's output.

The convolutional layer will pass 100 different filters, each filter will slide along the length dimension (word by word, in groups of 4), considering all the channels that define the word.

The outputs are shaped as:

(number of sentences, 50 words, 100 output dimension or filters)

The filters are shaped as:

(4 = length, 300 = word vector dimension, 100 output dimension of the convolution)

answered Oct 03 '22 21:10

Daniel Möller

Related questions
                            
                                Keras: How to get layer shapes in a Sequential model
                            
                                Unknown initializer: GlorotUniform when loading Keras model
                            
                                Keras difference between generator and sequence
                            
                                What is the difference between Keras and tf.keras in TensorFlow 1.1+?
                            
                                What are the differences between all these cross-entropy losses in Keras and TensorFlow?
                            
                                Shuffling training data with LSTM RNN
                            
                                keras tensorboard: plot train and validation scalars in a same figure
                            
                                In Keras, how to get the layer name associated with a "Model" object contained in my model?
                            
                                ResNet: 100% accuracy during training, but 33% prediction accuracy with the same data
                            
                                What is y_true and y_pred when creating a custom metric in Keras?
                            
                                How to understand SpatialDropout1D and when to use it?
                            
                                Does ImageDataGenerator add more images to my dataset?
                            
                                Running Tensorflow in Jupyter Notebook
                            
                                AttributeError: 'module' object has no attribute 'computation'
                            
                                What's the purpose of keras.backend.function()
                            
                                Tensorflow not running on GPU
                            
                                UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually
                            
                                Keras uses way too much GPU memory when calling train_on_batch, fit, etc
                            
                                How does keras define "accuracy" and "loss"?
                            
                                How can I print the values of Keras tensors?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With