Why do we call the fully connected layers in CNN "the Top Layers"?

Tags:

I've read some papers about Convolutional Neural Networks and found that almost all the papers call the fully connected layers in a normal CNN "the Top Layers".

However, as is shown in the most papers, the typical CNNs have a top-down structure and the fully connected layers, which are usually followed by a softmax classifier, are put in the bottom of the network. So, why do we call them the "Top layers"? Is this a kind of convention or there is some other considerations I don't know?

848

asked Oct 14 '17 08:10

konchy

Video Answer

2 Answers

I think it's just a matter of taste, but saying the "top layers" correlates with the notion of "head" in the neural networks. People say "classification head" and "regression head" meaning the output layer of the neural network (this terminology is used in tf.estimator.Estimator, also see some discussions here and here). If you see it this way, the layers just before the head are the top ones, while the input layers are the bottom. Anyway, you should double check what particular layers are meant when they are referred to as "top".

113

answered Oct 03 '22 10:10

Maxim

There is a good reason to distinguish them from rest of the layers, well beyond "convention".

CNN have many layers, each looking at different level of abstraction. It starts from very simple shapes and edges and later learns e.g. to recognise eyes and other complex features. In a typical setting the top layer will be one or two layers deep fully connected network. Now, the important piece: the top layer weights are most directly influenced by the labels. That is the layer that effectively makes a decision (or rather produce probabilities) that something is a cat.

Imagine now that you want to build your own model to recognise cute cats, not just cats. If you start from scratch, you have to provide large volume of training examples so that the model learns to recognise what constitutes a cat in the first place. Often you don't have the luxury of that amount of data or enough processing power. What you might do instead:

Take an existing, well-performing model with already learned weights on e.g. ImageNet. There are some amazing, state-of-the-art models out there, trained on millions of images. You will hardly be able to beat winners of ILSVRC competition.
Remove the top layer. You are not interested in all the labels that the original model has learned.
Fix the weights of the model you have borrowed. It's already excellent at recognising cats and you don't want to screw weights by training.
Add your own top layer and train the model on cute cats.

The idea behind is that the original model has learned to recognise generic features in CNN layers and these can be reused. The top layer goes already beyond generic, into specific pieces that are in the training set - and these can be discarded. No cute cats there.

answered Oct 03 '22 09:10

Lukasz Tracewski

Related questions
                            
                                Getting 'ValueError: shapes not aligned' on SciKit Linear Regression
                            
                                Tensorflow estimator: average_loss vs loss
                            
                                Trainable sklearn StandardScaler for R
                            
                                is it possible to implement dynamic class weights in keras?
                            
                                How Transformer is Bidirectional - Machine Learning
                            
                                How to load the saved tokenizer from pretrained model
                            
                                Implementing PCA with Numpy
                            
                                What is tape-based autograd in Pytorch?
                            
                                Compiling Caffe C++ Classification Example
                            
                                Keras: How to feed input directly into other hidden layers of the neural net than the first?
                            
                                Probability prediction method of KNeighborsClassifier returns only 0 and 1
                            
                                Keras LSTM - why different results with "same" model & same weights?
                            
                                How do I use principal component analysis in supervised machine learning classification problems?
                            
                                How do I convert new data into the PCA components of my training data?
                            
                                Reinforcement learning algorithms for continuous states, discrete actions
                            
                                Log transform dependent variable for regression tree
                            
                                Tensorflow reshape tensor
                            
                                Observations meaning - OpenAI Gym
                            
                                error when using keras' sk-learn API
                            
                                scikit-learn classification on soft labels

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why do we call the fully connected layers in CNN "the Top Layers"?

Tags:

terminology

machine-learning

neural-network

deep-learning

conv-neural-network