In the diagram (architecture) below, how was the (fully-connected) dense layer of 4096 units derived from last max-pool layer (on the right) of dimensions <code>256x13x13</code>? Instead of 4096, shouldn't it be 256*13*13=43264 ? <img src="https://i.stack.imgur.com/aBHPw.png" alt="Convolutional Neural Network">

If I'm correct, you're asking why the <code>4096x1x1</code> layer is much smaller. That's because it's a fully connected layer. Every neuron from the last max-pooling layer (=<code>256*13*13=43264</code> neurons) is connectd to every neuron of the fully-connected layer. This is an example of an ALL to ALL connected neural network: <img src="https://i.stack.imgur.com/g5HxU.png" alt="enter image description here"> As you can see, layer2 is bigger than layer3. That doesn't mean they can't connect. There is no conversion of the last max-pooling layer -> all the neurons in the max-pooling layer are just connected with all the 4096 neurons in the next layer. The 'dense' operation just means calculate the weights and biases of all these connections (= 4096 * 43264 connections) and add the bias of the neurons to calculate the next output. It's connected the same was an MLP. But why 4096? There is no reasoning. It's just a choice. It could have been 8000, it could have been 20, it just depends on what works best for the network.

You are right in that the last convolutional layer has <code>256 x 13 x 13 = 43264</code> neurons. However, there is a max-pooling layer with <code>stride = 3</code> and <code>pool_size = 2</code>. This will produce an output of size <code>256 x 6 x 6</code>. You connect this to a fully-connected layer. In order to do that, you first have to flatten the output, which will take the shape - <code>256 x 6 x 6 = 9216 x 1</code>. To map <code>9216</code> neurons to <code>4096</code> neurons, we introduce a <code>9216 x 4096</code> weight matrix as the weight of dense/fully-connected layer. Therefore, <code>w^T * x = [9216 x 4096]^T * [9216 x 1] = [4096 x 1]</code>. In short, each of the <code>9216</code> neurons will be connected to all <code>4096</code> neurons. That is why the layer is called a dense or a fully-connected layer. As others have said it above, there is no hard rule about why this should be 4096. The dense layer just has to have enough number of neurons so as to capture variability of the entire dataset. The dataset under consideration - ImageNet 1K - is quite difficult and has 1000 categories. So <code>4096</code> neurons to start with do not seem too much.

Understanding the dimensions of a fully-connected layer that follows a max-pooling layer [closed]

Tags:

In the diagram (architecture) below, how was the (fully-connected) dense layer of 4096 units derived from last max-pool layer (on the right) of dimensions 256x13x13? Instead of 4096, shouldn't it be 256*13*13=43264 ?

Convolutional Neural Network

449

asked Mar 11 '17 10:03

xlax

2 Answers

If I'm correct, you're asking why the 4096x1x1 layer is much smaller.

That's because it's a fully connected layer. Every neuron from the last max-pooling layer (=256*13*13=43264 neurons) is connectd to every neuron of the fully-connected layer.

This is an example of an ALL to ALL connected neural network: enter image description here As you can see, layer2 is bigger than layer3. That doesn't mean they can't connect.

There is no conversion of the last max-pooling layer -> all the neurons in the max-pooling layer are just connected with all the 4096 neurons in the next layer.

The 'dense' operation just means calculate the weights and biases of all these connections (= 4096 * 43264 connections) and add the bias of the neurons to calculate the next output.

It's connected the same was an MLP.

But why 4096? There is no reasoning. It's just a choice. It could have been 8000, it could have been 20, it just depends on what works best for the network.

182

answered Sep 22 '22 15:09

Thomas Wagenaar

You are right in that the last convolutional layer has 256 x 13 x 13 = 43264 neurons. However, there is a max-pooling layer with stride = 3 and pool_size = 2. This will produce an output of size 256 x 6 x 6. You connect this to a fully-connected layer. In order to do that, you first have to flatten the output, which will take the shape - 256 x 6 x 6 = 9216 x 1. To map 9216 neurons to 4096 neurons, we introduce a 9216 x 4096 weight matrix as the weight of dense/fully-connected layer. Therefore, w^T * x = [9216 x 4096]^T * [9216 x 1] = [4096 x 1]. In short, each of the 9216 neurons will be connected to all 4096 neurons. That is why the layer is called a dense or a fully-connected layer.

As others have said it above, there is no hard rule about why this should be 4096. The dense layer just has to have enough number of neurons so as to capture variability of the entire dataset. The dataset under consideration - ImageNet 1K - is quite difficult and has 1000 categories. So 4096 neurons to start with do not seem too much.

answered Sep 18 '22 15:09

Autonomous

Related questions
                            
                                tf.layers.batch_normalization large test error
                            
                                IntelliJ, Alt+Enter doesnt work
                            
                                How to query nested arrays in a postgres json column?
                            
                                Why this SFINAE snippet is not working in g++, but working in MSVC?
                            
                                How to run migration SQL script using Entity Framework Core
                            
                                The generic parameters Any of kotlin are converted to wildcards(?)
                            
                                How to install mongodb-clients latest version on Ubuntu?
                            
                                How to express dependency in maven on java ee features for transition to Java 9?
                            
                                How to search SQL column containing JSON array
                            
                                SQLAlchemy Joinedload filter column
                            
                                angular how to await subscribe
                            
                                ThreadPoolExecutor: how to limit the queue maxsize?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With