What's the difference between those two? It would also help to explain in the more general context of convolutional networks.
Also, as a side note, what is channels? In other words, please break down the 3 terms for me: channels vs filters vs kernel.
kernel_size: An integer or tuple/list of 2 integers, specifying the height and width of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions. strides: An integer or tuple/list of 2 integers, specifying the strides of the convolution along the height and width.
A filter or a kernel in a conv2D layer has a height and a width. They are generally smaller than the input image and so we move them across the whole image. The area where the filter is on the image is called the receptive field.
conv1d is used when you slide your convolution kernels along 1 dimensions (i.e. you reuse the same weights, sliding them along 1 dimensions), whereas tf. layers. conv2d is used when you slide your convolution kernels along 2 dimensions (i.e. you reuse the same weights, sliding them along 2 dimensions).
They are the same! Filter or kernel is simple group of weights shared all over the input space.
Each convolution layer consists of several convolution channels (aka. depth or filters). In practice, they are a number such as 64, 128, 256, 512
etc. This is equal to number of channels in the output of a convolutional layer. kernel_size
, on the other hand, is the size of these convolution filters. In practice, they take values such as 3x3
or 1x1
or 5x5
. To abbreviate, they can be written as 1
or 3
or 5
as they are mostly square in practice.
Edit
Following quote should make it more clear.
Discussion on vlfeat
Suppose X
is an input with size W x H x D x N
(where N
is the size of the batch) to a convolutional layer containing filter F
(with size FW x FH x FD x K
) in a network.
The number of feature channels D
is the third dimension of the input X
here (for example, this is typically 3 at the first input to the network if the input consists of colour images).
The number of filters K
is the fourth dimension of F
.
The two concepts are closely linked because if the number of filters in a layer is K
, it produces an output with K feature channels. So the input to the next layer will have K
feature channels.
The FW x FH
above is filter size you are looking for.
Added
You should be familiar with filters. You can consider each filter to be responsible for extracting some type of feature from a raw image. The CNNs try to learn such filters i.e. the filters parametrized in CNNs are learned during training of CNNs. You apply each filter in a Conv2D to each input channel and combine these to get output channels. So, the number of filters and the number of output channels are the same.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With