I'm just beginning my ML journey and have done a few tutorials. One thing that's not clear (to me) is how the 'filter' parameter is determined for Keras Conv2D.
Most sources I've read simply set the parameter to 32 without explanation. Is this just a rule of thumb or do the dimensions of the input images play a part? For example, the images in CIFAR-10 are 32x32
Specifically:
model = Sequential()
filters = 32
model.add(Conv2D(filters, (3, 3), padding='same', input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(filters, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
The next layer has a filter parameter of filter*2 or 64. Again, how is this calculated?
Tx.
Joe
filters. Mandatory Conv2D parameter is the numbers of filters that convolutional layers will learn from. It is an integer value and also determines the number of output filters in the convolution.
Conv2D Layers By applying this formula to the first Conv2D layer (i.e., conv2d ), we can calculate the number of parameters using 32 * (1 * 3 * 3 + 1) = 320, which is consistent with the model summary. The input channel number is 1, because the input data shape is 28 x 28 x 1 and the number 1 is the input channel.
The first required Conv2D parameter is the number of filters that the convolutional layer will learn. Layers early in the network architecture (i.e., closer to the actual input image) learn fewer convolutional filters while layers deeper in the network (i.e., closer to the output predictions) will learn more filters.
Convolutional Neural Networks are (usually) supervised methods for image/object recognition. This means that you need to train the CNN using a set of labelled images: this allows to optimize the weights of its convolutional filters, hence learning the filters shape themselsves, to minimize the error.
Actually - there is no a good answer to your question. Most of the architectures are usually carefully designed and finetuned during many experiments. I could share with you some of the rules of thumbs one should apply when designing its own architecture:
Avoid a dimension collapse in the first layer. Let's assume that your input filter has a (n, n)
spatial shape for RGB
image. In this case, it is a good practice to set the filter numbers to be greater than n * n * 3
as this is the dimensionality of the input of a single filter. If you set smaller number - you could suffer from the fact that many useful pieces of information about the image are lost due to initialization which dropped informative dimensions. Of course - this is not a general rule - e.g. for a texture recognition, where image complexity is lower - a small number of filters might actually help.
Think more about volume than filters number - when setting the number of filters it's important to think about the volume change instead of the change of filter numbers between the consecutive layers. E.g. in VGG
- even though the number of filters doubles after pooling layer - the actual feature map volume is decreased by a factor of 2, because of pooling decreasing the feature map by a factor of 4
. Usually decreasing the size of the volume by more than 3 should be considered as a bad practice. Most of the modern architectures use the volume drop factor in the range between 1 and 2. Still - this is not a general rule - e.g. in case of a narrow hierarchy - the greater value of volume drop might actually help.
Avoid bottlenecking. As one may read in this milestone paper bottlenecking might seriously harm your training process. It occurs when dropping the volume is too severe. Of course - this still might be achieved - but then you should use the intelligent downsampling, used e.g. in Inception v>2
Check 1x1 convolutions - it's believed that filters activation are highly correlated. One may take advantage of it by using 1x1 convolutions - namely convolution with a filter size of 1. This makes possible e.g. volume dropping by them instead of pooling
or intelligent downsampling (see example here). You could e.g. build twice more filters and then cut 25% of them by using 1x1 convs as a consecutive layer.
As you may see. There is no easy way to choose the number of filters. Except for the hints above, I'd like to share with you one of my favorite sanity checks on the number of filters. It takes 2 easy steps:
Usually - if the number of filters is too low (in general) - these two tests will show you that. If - during your training process - with regularization - your network severely overfits - this is a clear indicator that your network has way too many filters.
Cheers.
Number of filters is chosen based complexity of task. More complex tasks require more filters. And usually number of filters grows after every layer (eg 128 -> 256 -> 512
). First layers (with lower number of filters) catch few of some simple features of images (edges, color tone, etc) and next layers are trying to obtain more complex features based on simple ones.
There is nice course from Stanford to give you intuition and understanding of CNN.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With