I'm reading material from the TensorFlow website:
https://www.tensorflow.org/tutorials/layers
Suppose we have 10 greyscale monochrome 28x28 pixel images,
So, if we apply a second convolutional layer(let's say 64 5x5 filters as in the link), do we apply these filters to each channel of each image and get 10*32*64*14*14 data?
Pooling layers are used to reduce the dimensions of the feature maps. Thus, it reduces the number of parameters to learn and the amount of computation performed in the network. The pooling layer summarises the features present in a region of the feature map generated by a convolution layer.
The first layer of a Convolutional Neural Network is always a Convolutional Layer. Convolutional layers apply a convolution operation to the input, passing the result to the next layer. A convolution converts all the pixels in its receptive field into a single value.
A conv-layer has parameters to learn (that is your weights which you update each step), whereas the pooling layer does not - it is just applying some given function e.g max-function. Save this answer.
In a convolutional neural network, a convolutional layer is usually followed by a pooling layer. Pooling layer is usually added to speed up computation and to make some of the detected features more robust.
Yes and No. You do apply the filters to each channel and each image, but you don't get 10*32*64*14*14
output dimensions. The dimensionality of the output is going to be 10*64*14*14
, because the layer specified 64 output channels per image. In turn, the weights used for this convolution will have size 32*64*5*5
(64 5-by-5 filters for every channel on the input).
No.
If you convolve & pad (ignoring the batch size) a 14x14x32
volume with a set of 64 5x5
filters, you'll end up with a 14x14x64
output volume
Every single convolutional filter is convolved along the whole input depth. Thus, your 14x14x32
input volume is convolved with a 5x5
filter and then the output is a 14x14x1
feature map.
Then, the second 5x5
filter of the stack of 64 filters, is convolved again with the input volume.
The same operation is done for each one of the 64 filters and the resulting feature maps are stacked, forming your output volume 14x14x64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With