Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caffe | solver.prototxt values setting strategy

On Caffe, I am trying to implement a Fully Convolution Network for semantic segmentation. I was wondering is there a specific strategy to set up your 'solver.prototxt' values for the following hyper-parameters:

  • test_iter
  • test_interval
  • iter_size
  • max_iter

Does it depend on the number of images you have for your training set? If so, how?

like image 505
Abhilash Panigrahi Avatar asked Nov 18 '15 13:11

Abhilash Panigrahi


People also ask

What is a solver in machine learning?

The solver orchestrates model optimization by coordinating the network's forward inference and backward gradients to form parameter updates that attempt to improve the loss.

What is the difference between model and solver in Caffe?

The data.mdb files will be very large, that’s where your images went. Caffe has a very nice abstraction that separates neural network definitions (models) from the optimizers (solvers). A model defines the structure of a neural network, while a solver defines all information about how gradient descent will be conducted.

What are the default values that Caffe uses as default values?

Kingma et al. [1] proposed to use as default values. Caffe uses the values of momemtum, momentum2, deltafor , respectively. [1] D. Kingma, J. Ba. Adam: A Method for Stochastic Optimization.

What are the responsibilities of the Caffe solvers?

The responsibilities of learning are divided between the Solver for overseeing the optimization and generating parameter updates and the Net for yielding loss and gradients. The Caffe solvers are:

How to do data pre-processing in Caffe?

// For data pre-processing, we can do simple scaling and subtracting the // data mean, if provided. Note that the mean subtraction is always carried // out before scaling. // Specify the batch size. // Specify if we would like to randomly crop an image. // Specify if we want to randomly mirror data. // DEPRECATED: use LayerParameter. // in Caffe.


1 Answers

In order to set these values in a meaningful manner, you need to have a few more bits of information regarding your data:

1. Training set size the total number of training examples you have, let's call this quantity T.
2. Training batch size the number of training examples processed together in a single batch, this is usually set by the input data layer in the 'train_val.prototxt'. For example, in this file the train batch size is set to 256. Let's denote this quantity by tb.
3. Validation set size the total number of examples you set aside for validating your model, let's denote this by V.
4. Validation batch size value set in batch_size for the TEST phase. In this example it is set to 50. Let's call this vb.

Now, during training, you would like to get an un-biased estimate of the performance of your net every once in a while. To do so you run your net on the validation set for test_iter iterations. To cover the entire validation set you need to have test_iter = V/vb.
How often would you like to get this estimation? It's really up to you. If you have a very large validation set and a slow net, validating too often will make the training process too long. On the other hand, not validating often enough may prevent you from noting if and when your training process failed to converge. test_interval determines how often you validate: usually for large nets you set test_interval in the order of 5K, for smaller and faster nets you may choose lower values. Again, all up to you.

In order to cover the entire training set (completing an "epoch") you need to run T/tb iterations. Usually one trains for several epochs, thus max_iter=#epochs*T/tb.

Regarding iter_size: this allows to average gradients over several training mini batches, see this thread fro more information.

like image 180
Shai Avatar answered Sep 20 '22 18:09

Shai