On Caffe, I am trying to implement a Fully Convolution Network for semantic segmentation. I was wondering is there a specific strategy to set up your 'solver.prototxt'
values for the following hyper-parameters:
Does it depend on the number of images you have for your training set? If so, how?
The solver orchestrates model optimization by coordinating the network's forward inference and backward gradients to form parameter updates that attempt to improve the loss.
The data.mdb files will be very large, that’s where your images went. Caffe has a very nice abstraction that separates neural network definitions (models) from the optimizers (solvers). A model defines the structure of a neural network, while a solver defines all information about how gradient descent will be conducted.
Kingma et al. [1] proposed to use as default values. Caffe uses the values of momemtum, momentum2, deltafor , respectively. [1] D. Kingma, J. Ba. Adam: A Method for Stochastic Optimization.
The responsibilities of learning are divided between the Solver for overseeing the optimization and generating parameter updates and the Net for yielding loss and gradients. The Caffe solvers are:
// For data pre-processing, we can do simple scaling and subtracting the // data mean, if provided. Note that the mean subtraction is always carried // out before scaling. // Specify the batch size. // Specify if we would like to randomly crop an image. // Specify if we want to randomly mirror data. // DEPRECATED: use LayerParameter. // in Caffe.
In order to set these values in a meaningful manner, you need to have a few more bits of information regarding your data:
1. Training set size the total number of training examples you have, let's call this quantity T
.
2. Training batch size the number of training examples processed together in a single batch, this is usually set by the input data layer in the 'train_val.prototxt'
. For example, in this file the train batch size is set to 256. Let's denote this quantity by tb
.
3. Validation set size the total number of examples you set aside for validating your model, let's denote this by V
.
4. Validation batch size value set in batch_size
for the TEST phase. In this example it is set to 50. Let's call this vb
.
Now, during training, you would like to get an un-biased estimate of the performance of your net every once in a while. To do so you run your net on the validation set for test_iter
iterations. To cover the entire validation set you need to have test_iter = V/vb
.
How often would you like to get this estimation? It's really up to you. If you have a very large validation set and a slow net, validating too often will make the training process too long. On the other hand, not validating often enough may prevent you from noting if and when your training process failed to converge. test_interval
determines how often you validate: usually for large nets you set test_interval
in the order of 5K, for smaller and faster nets you may choose lower values. Again, all up to you.
In order to cover the entire training set (completing an "epoch") you need to run T/tb
iterations. Usually one trains for several epochs, thus max_iter=#epochs*T/tb
.
Regarding iter_size
: this allows to average gradients over several training mini batches, see this thread fro more information.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With