Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Idea behind how many fully-connected layers should be use in a general CNN network

I notice that some famous CNN structure in ILSVRC, such as AlexNet, VGG, ZF net, etc. They all use two fully-connected layer, followed by the output layer. So why two? Is there any in intrinsic idea behind this?

I try to understand it in this way: before the fully-connected layer, we have bunch of convolutional layers, which might contains various high-level features. And the fully-connected layer is something like a feature list abstracted from convoluted layers. But in this sense, one FC layer should be enough. Why two? And why not three or four or more? I guess a constrains behind this might be computing cost. But is it necessary that more FC layer always provide better result? And what might be the reason for choosing two?

like image 869
Ziqi Liu Avatar asked Oct 30 '17 03:10

Ziqi Liu


1 Answers

And the fully-connected layer is something like a feature list abstracted from convoluted layers.

Yes, it's correct. The goal of this layer is to combine features detected from the image patches together for a particular task. In some (very simplified) sense, conv layers are smart feature extractors, and FC layers is the actual network.

Why two? And why not three or four or more?

I can't say the exact reasons for these particular networks, but I can imagine few possible reasons why this choice makes sense:

  • You don't want to make your first FC layer too big, because it contains most of model parameters, or, in other words, consumes most memory. E.g. VGGNet has 7*7*512*4096 = 102,760,448 parameters in FC layer, which is 72% of all network parameters. Making it twice as big will make it 85%!

    Hence, two smaller FC layers, one after another, is generally more flexible, given memory constraints, than one big FC layer.

  • Conv layers are much more important in terms of accuracy, than the way they are combined in the top layers. There's nothing wrong with three or more FC layers, but I don't think you'll see any significant changes if you try that.

    In fact, the case of all-convolutional network has shown that one can greatly simplify the network by replacing FC layers with convolutional layers without visible performance degradation. I'd like to stress here: these networks do not contain FC layers at all. I won't be surprised if the authors didn't spend too much time on FC part and were focused on the earlier layers. The latest CNNs tend to get rid of FC layers as well.

By the way, I don't think that computational cost is a big factor as far as FC layer is concerned, because most of the computation is happening in the first conv layer. Remember that convolution is much more expensive operation than matrix multiplication.

like image 114
Maxim Avatar answered Sep 27 '22 15:09

Maxim