I've read some papers about Convolutional Neural Networks and found that almost all the papers call the fully connected layers in a normal CNN "the Top Layers".
However, as is shown in the most papers, the typical CNNs have a top-down structure and the fully connected layers, which are usually followed by a softmax classifier, are put in the bottom of the network. So, why do we call them the "Top layers"? Is this a kind of convention or there is some other considerations I don't know?
Fully Connected Layer is simply, feed forward neural networks. Fully Connected Layers form the last few layers in the network. The input to the fully connected layer is the output from the final Pooling or Convolutional Layer, which is flattened and then fed into the fully connected layer.
Fully Connected Layer or Network A fully connected layer is the last part of a convolutional neural network. It can be a single Dense layer or a complex multilayer perceptron. The inputs of this layer come from the flatten layer. Therefore, flatten layer can be seen as the input layer of this component.
Common Architectures and Training Patterns As we have seen, Convolutional Neural Networks are made up of four primary layers: CONV , POOL , RELU , and FC .
I think it's just a matter of taste, but saying the "top layers" correlates with the notion of "head" in the neural networks. People say "classification head" and "regression head" meaning the output layer of the neural network (this terminology is used in tf.estimator.Estimator
, also see some discussions here and here). If you see it this way, the layers just before the head are the top ones, while the input layers are the bottom. Anyway, you should double check what particular layers are meant when they are referred to as "top".
There is a good reason to distinguish them from rest of the layers, well beyond "convention".
CNN have many layers, each looking at different level of abstraction. It starts from very simple shapes and edges and later learns e.g. to recognise eyes and other complex features. In a typical setting the top layer will be one or two layers deep fully connected network. Now, the important piece: the top layer weights are most directly influenced by the labels. That is the layer that effectively makes a decision (or rather produce probabilities) that something is a cat.
Imagine now that you want to build your own model to recognise cute cats, not just cats. If you start from scratch, you have to provide large volume of training examples so that the model learns to recognise what constitutes a cat in the first place. Often you don't have the luxury of that amount of data or enough processing power. What you might do instead:
The idea behind is that the original model has learned to recognise generic features in CNN layers and these can be reused. The top layer goes already beyond generic, into specific pieces that are in the training set - and these can be discarded. No cute cats there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With