What is the advantage of using multiples of the same filter in convolutional networks in deep learning

Tags:

What is the advantage of using multiples of the same filter in convolutional networks in deep learning?

For example: We use 6 filter of size [5,5] at the first layer to scan the image data, which is a matrix of size [28,28]. The question is why do we not use only a single filter of size [5,5] but use 6 or more of them. In the end they will scan the exact same pixels. I can see that the random weight might be different but DL model will adjust to it anyway.

So, specifically what is the main advantage and purpose of using multiple filters of the same shape then in convnets?

902

asked Jan 04 '18 20:01

entropy

2 Answers

Why is filter shape the same?

First, the kernel shape is the same merely to speed up computation. This allows to apply the convolution in a batch, for example using col2im transformation and matrix multiplication. This also makes it convenient to store all the weights in one multidimensional array. Though mathematically one can imagine using several filters of different shape.

Some architectures, such as Inception network, use this idea and apply different convolutional layers (with different kernels) in parallel and in the end stack up the feature maps. This turned out to be very useful.

Why isn't one filter enough?

Because each filter is going to learn exactly one pattern that will excite it, e.g., Gabor-like vertical line. A single filter can't be equally excited by a horizontal and a vertical line. So to recognize an object, one such filter is not enough.

For example, in order to recognize a cat, a neural network might need to recognize the eyes, the tail, ... of all which are composed of different lines and edges. The network can be confident about the object on the image if it can recognize a whole variety of different shapes and patterns in the image. This will be true even for a simple data set like MNIST.

Why do filters learn different patterns?

A simple analogy: imagine a linear regression network with one hidden layer. Each neuron in the hidden layer is connected to each input feature, so they are all symmetrical. But after some training, different neurons are going to learn different high-level features, which are useful to make a correct prediction.

There's a catch: if the network is initialized with zeros, it's going to suffer from symmetry issues and in general won't converge to the target distribution. So it's essential to create asymmetry in the neurons from the very beginning and let different neurons get excited differently from the same input data. This in turn leads to different gradients getting applied to the weights, usually increasing the asymmetry even more. That's why different neurons are trained differently.

It's important to mention another issue that is still possible with random init called co-adaptation: when different neurons learn to adapt and depend on each other. This problem has been solved by a dropout technique and later by batch normalization, essentially by adding noise to the training process, in various ways. Combining it together, neurons are much more likely to learn different latent representations of the data.

Further links

Highly recommend to read CS231n tutorial by Stanford to gain better intuition about convolutional neural networks.

189

answered Sep 29 '22 14:09

Maxim

Zeiler and Fergus https://arxiv.org/pdf/1311.2901.pdf have a good picture showing kernel response to different parts of a picture.

Each kernel convolves over the image, so all the kernels (potentially) see all the pixels. Each of your 6 filters will "learn" a different feature. In the first layer, some will typically learn line features that look like lines (horizontal, vertical, diagonal) and some will learn colour blobs. In the next layer, these get combined. Pixels into edges into shapes.

It might help to look up Prewitt filters https://en.m.wikipedia.org/wiki/Prewitt_operator In this case, it is a single 3x3 kernel which convolves over the whole image and gives a feature map showing horizontal (or vertical) edges. You need one filter for horizontal and a different filter for vertical, but you can combine them to give both. In a neural network, the kernel values are learned from data but the feature maps at each layer are still produced by convolving the kernel over the input.

answered Sep 29 '22 14:09

Pam

Related questions
                            
                                In TensorFlow 2.0, how to feed TFRecord data to keras model?
                            
                                tf how to restore two variables from the same variable
                            
                                ModuleNotFoundError: No module named 'tensorflow.contrib' with tensorflow=2.0.0
                            
                                Problems understanding linear regression model tuning in tf.keras
                            
                                Data stored in MLMD in TensorFlow TFX
                            
                                How to scale target values of a Keras autoencoder model using a sklearn pipeline?
                            
                                Google Meet background Blur
                            
                                Finetune SavedModel Failure due to No Gradient loaded
                            
                                After calculating a tensor, how can I show it as a image?
                            
                                In Tensorflow, how to unravel the flattened indices obtained by tf.nn.max_pool_with_argmax?
                            
                                semantic segmentation with tensorflow - ValueError in loss function (sparse-softmax)
                            
                                How to install TensorFlow in jupyter notebook on Azure Machine Learning Studio
                            
                                Batch normaliztion on tensorflow - tf.contrib.layers.batch_norm works good on training but poor testing/validation results
                            
                                TensorFlow ValueError: Variable does not exist, or was not created with tf.get_variable()
                            
                                Tensorflow: is it possible to create 2D LSTM?
                            
                                How to speedup my tensorflow execution on hadoop?
                            
                                How to feed multiple inputs through feed_dict in tensorflow
                            
                                TensorFlow crashes when fitting TensorForestEstimator
                            
                                How can Tensorboard files be merged/combined or appended?
                            
                                Object Detection API Assertion failed: [maximum box coordinate value is larger than 1.01: ] for resnet models in

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the advantage of using multiples of the same filter in convolutional networks in deep learning

Tags:

machine-learning

neural-network

tensorflow

deep-learning

conv-neural-network