Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding the dimensions of a fully-connected layer that follows a max-pooling layer [closed]

Tags:

In the diagram (architecture) below, how was the (fully-connected) dense layer of 4096 units derived from last max-pool layer (on the right) of dimensions 256x13x13? Instead of 4096, shouldn't it be 256*13*13=43264 ?

Convolutional Neural Network

like image 449
xlax Avatar asked Mar 11 '17 10:03

xlax


People also ask

How does Max pooling change dimensions?

Max pooling is a type of operation that is typically added to CNNs following individual convolutional layers. When added to a model, max pooling reduces the dimensionality of images by reducing the number of pixels in the output from the previous convolutional layer.

What is the dimensionality of the output of the Max pooling layer?

Followed by a max-pooling layer, the method of calculating pooling layer is as same as the Conv layer. The kernel size of max-pooling layer is (2,2) and stride is 2, so output size is (28–2)/2 +1 = 14.

How do you calculate the fully connected layer?

To calculate it, we have to start with the size of the input image and calculate the size of each convolutional layer. In the simple case, the size of the output CNN layer is calculated as “input_size-(filter_size-1)”. For example, if the input image_size is (50,50) and filter is (3,3) then (50-(3–1)) = 48.

How pooling layers reduce the dimensions of the data?

Pooling layers are used to reduce the dimensions of the feature maps. Thus, it reduces the number of parameters to learn and the amount of computation performed in the network. The pooling layer summarises the features present in a region of the feature map generated by a convolution layer.


2 Answers

If I'm correct, you're asking why the 4096x1x1 layer is much smaller.

That's because it's a fully connected layer. Every neuron from the last max-pooling layer (=256*13*13=43264 neurons) is connectd to every neuron of the fully-connected layer.

This is an example of an ALL to ALL connected neural network: enter image description here As you can see, layer2 is bigger than layer3. That doesn't mean they can't connect.

There is no conversion of the last max-pooling layer -> all the neurons in the max-pooling layer are just connected with all the 4096 neurons in the next layer.

The 'dense' operation just means calculate the weights and biases of all these connections (= 4096 * 43264 connections) and add the bias of the neurons to calculate the next output.

It's connected the same was an MLP.

But why 4096? There is no reasoning. It's just a choice. It could have been 8000, it could have been 20, it just depends on what works best for the network.

like image 182
Thomas Wagenaar Avatar answered Sep 22 '22 15:09

Thomas Wagenaar


You are right in that the last convolutional layer has 256 x 13 x 13 = 43264 neurons. However, there is a max-pooling layer with stride = 3 and pool_size = 2. This will produce an output of size 256 x 6 x 6. You connect this to a fully-connected layer. In order to do that, you first have to flatten the output, which will take the shape - 256 x 6 x 6 = 9216 x 1. To map 9216 neurons to 4096 neurons, we introduce a 9216 x 4096 weight matrix as the weight of dense/fully-connected layer. Therefore, w^T * x = [9216 x 4096]^T * [9216 x 1] = [4096 x 1]. In short, each of the 9216 neurons will be connected to all 4096 neurons. That is why the layer is called a dense or a fully-connected layer.

As others have said it above, there is no hard rule about why this should be 4096. The dense layer just has to have enough number of neurons so as to capture variability of the entire dataset. The dataset under consideration - ImageNet 1K - is quite difficult and has 1000 categories. So 4096 neurons to start with do not seem too much.

like image 38
Autonomous Avatar answered Sep 18 '22 15:09

Autonomous