How filters are initialized in convnet

Tags:

I read a lot of papers on convnets, but there is one thing I don't understand, how the filters in convolutional layer are initialized ? Because, for examples, in first layer, filters should detect edges etc.. But if it randomly init, it could not be accurate ? Same for next layer and high-level features. And an other question, what are the range of the value in those filters ?

Many thanks to you!

616

asked Dec 07 '16 14:12

Pusheen_the_dev

1 Answers

You can either initialize the filters randomly or pretrain them on some other data set.

Some references:

http://deeplearning.net/tutorial/lenet.html:

Notice that a randomly initialized filter acts very much like an edge detector!

Note that we use the same weight initialization formula as with the MLP. Weights are sampled randomly from a uniform distribution in the range [-1/fan-in, 1/fan-in], where fan-in is the number of inputs to a hidden unit. For MLPs, this was the number of units in the layer below. For CNNs however, we have to take into account the number of input feature maps and the size of the receptive fields.

http://cs231n.github.io/transfer-learning/ :

Transfer Learning

In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest. The three major Transfer Learning scenarios look as follows:

ConvNet as fixed feature extractor. Take a ConvNet pretrained on ImageNet, remove the last fully-connected layer (this layer's outputs are the 1000 class scores for a different task like ImageNet), then treat the rest of the ConvNet as a fixed feature extractor for the new dataset. In an AlexNet, this would compute a 4096-D vector for every image that contains the activations of the hidden layer immediately before the classifier. We call these features CNN codes. It is important for performance that these codes are ReLUd (i.e. thresholded at zero) if they were also thresholded during the training of the ConvNet on ImageNet (as is usually the case). Once you extract the 4096-D codes for all images, train a linear classifier (e.g. Linear SVM or Softmax classifier) for the new dataset.

Fine-tuning the ConvNet. The second strategy is to not only replace and retrain the classifier on top of the ConvNet on the new dataset, but to also fine-tune the weights of the pretrained network by continuing the backpropagation. It is possible to fine-tune all the layers of the ConvNet, or it's possible to keep some of the earlier layers fixed (due to overfitting concerns) and only fine-tune some higher-level portion of the network. This is motivated by the observation that the earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors) that should be useful to many tasks, but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original dataset. In case of ImageNet for example, which contains many dog breeds, a significant portion of the representational power of the ConvNet may be devoted to features that are specific to differentiating between dog breeds.

Pretrained models. Since modern ConvNets take 2-3 weeks to train across multiple GPUs on ImageNet, it is common to see people release their final ConvNet checkpoints for the benefit of others who can use the networks for fine-tuning. For example, the Caffe library has a Model Zoo where people share their network weights.

When and how to fine-tune? How do you decide what type of transfer learning you should perform on a new dataset? This is a function of several factors, but the two most important ones are the size of the new dataset (small or big), and its similarity to the original dataset (e.g. ImageNet-like in terms of the content of images and the classes, or very different, such as microscope images). Keeping in mind that ConvNet features are more generic in early layers and more original-dataset-specific in later layers, here are some common rules of thumb for navigating the 4 major scenarios:

New dataset is small and similar to original dataset. Since the data is small, it is not a good idea to fine-tune the ConvNet due to overfitting concerns. Since the data is similar to the original data, we expect higher-level features in the ConvNet to be relevant to this dataset as well. Hence, the best idea might be to train a linear classifier on the CNN codes.

New dataset is large and similar to the original dataset. Since we have more data, we can have more confidence that we won't overfit if we were to try to fine-tune through the full network.

New dataset is small but very different from the original dataset. Since the data is small, it is likely best to only train a linear classifier. Since the dataset is very different, it might not be best to train the classifier form the top of the network, which contains more dataset-specific features. Instead, it might work better to train the SVM classifier from activations somewhere earlier in the network.

New dataset is large and very different from the original dataset. Since the dataset is very large, we may expect that we can afford to train a ConvNet from scratch. However, in practice it is very often still beneficial to initialize with weights from a pretrained model. In this case, we would have enough data and confidence to fine-tune through the entire network.

Practical advice. There are a few additional things to keep in mind when performing Transfer Learning:

Constraints from pretrained models. Note that if you wish to use a pretrained network, you may be slightly constrained in terms of the architecture you can use for your new dataset. For example, you can't arbitrarily take out Conv layers from the pretrained network. However, some changes are straight-forward: Due to parameter sharing, you can easily run a pretrained network on images of different spatial size. This is clearly evident in the case of Conv/Pool layers because their forward function is independent of the input volume spatial size (as long as the strides "fit"). In case of FC layers, this still holds true because FC layers can be converted to a Convolutional Layer: For example, in an AlexNet, the final pooling volume before the first FC layer is of size [6x6x512]. Therefore, the FC layer looking at this volume is equivalent to having a Convolutional Layer that has receptive field size 6x6, and is applied with padding of 0.

Learning rates. It's common to use a smaller learning rate for ConvNet weights that are being fine-tuned, in comparison to the (randomly-initialized) weights for the new linear classifier that computes the class scores of your new dataset. This is because we expect that the ConvNet weights are relatively good, so we don't wish to distort them too quickly and too much (especially while the new Linear Classifier above them is being trained from random initialization).

Additional References

CNN Features off-the-shelf: an Astounding Baseline for Recognition trains SVMs on features from ImageNet-pretrained ConvNet and reports several state of the art results.

DeCAF reported similar findings in 2013. The framework in this paper (DeCAF) was a Python-based precursor to the C++ Caffe library.

How transferable are features in deep neural networks? studies the transfer learning performance in detail, including some unintuitive findings about layer co-adaptations.

answered Oct 01 '22 03:10

Franck Dernoncourt

Related questions
                            
                                How to convert the body-pix models for tfjs to keras h5 or tensorflow frozen graph
                            
                                Difference between tfjs_layers_model and tfjs_graph_model
                            
                                what is the difference between conv2d and Conv2D in Keras?
                            
                                TensorFlow 2 custom loss: "No gradients provided for any variable" error
                            
                                Multi GPU training slower than single GPU on Tensorflow
                            
                                Split train data to train and validation by using tensorflow_datasets.load (TF 2.1)
                            
                                What this error means: `y` argument is not supported when using python generator as input
                            
                                How should I understand warmup learning rate in tensorflow object detection api?
                            
                                Session.run() /Tensor.eval() of Tensorflow run for a crazy long time
                            
                                TensorFlow on Jupyter: Can't restore variables
                            
                                Add a new device in TensorFlow
                            
                                Tensorflow examples all fail due to AttributeError: 'module' object has no attribute 'datasets'
                            
                                import tensor with mat format to tensorflow
                            
                                How to transform vector into unit vector in Tensorflow
                            
                                Distributed Tensorflow: ValueError “When: When using replicas, all Variables must have their device set” set: name: "Variable"
                            
                                conv2d_transpose is dependent on batch_size when making predictions
                            
                                Is it possible to set a timeout when dequeuing an item from a TensorFlow queue?
                            
                                Tensorflow reshape on convolution output gives TypeError
                            
                                How can I use tf.string_split() in tensorflow?
                            
                                TensorFlow: Convolution Neural Network with non-image input

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How filters are initialized in convnet

Tags:

tensorflow

deep-learning

keras

theano

Pusheen_the_dev

People also ask

1 Answers

Transfer Learning

Additional References

Franck Dernoncourt

Recent Activity

Donate For Us