On Caffe, I am trying to implement a Fully Convolution Network for semantic segmentation. I was wondering is there a specific strategy to set up your <code>'solver.prototxt'</code> values for the following hyper-parameters: <ul> <li>test_iter</li> <li>test_interval</li> <li>iter_size</li> <li>max_iter</li> </ul> Does it depend on the number of images you have for your training set? If so, how?

In order to set these values in a meaningful manner, you need to have a few more bits of information regarding your data: 1. Training set size the total number of training examples you have, let's call this quantity <code>T</code>. 2. Training batch size the number of training examples processed together in a single batch, this is usually set by the input data layer in the <code>'train_val.prototxt'</code>. For example, in this file the train batch size is set to 256. Let's denote this quantity by <code>tb</code>. 3. Validation set size the total number of examples you set aside for validating your model, let's denote this by <code>V</code>. 4. Validation batch size value set in <code>batch_size</code> for the TEST phase. In this example it is set to 50. Let's call this <code>vb</code>. Now, during training, you would like to get an un-biased estimate of the performance of your net every once in a while. To do so you run your net on the validation set for <code>test_iter</code> iterations. To cover the entire validation set you need to have <code>test_iter = V/vb</code>. How often would you like to get this estimation? It's really up to you. If you have a very large validation set and a slow net, validating too often will make the training process too long. On the other hand, not validating often enough may prevent you from noting if and when your training process failed to converge. <code>test_interval</code> determines how often you validate: usually for large nets you set <code>test_interval</code> in the order of 5K, for smaller and faster nets you may choose lower values. Again, all up to you. In order to cover the entire training set (completing an "epoch") you need to run <code>T/tb</code> iterations. Usually one trains for several epochs, thus <code>max_iter=#epochs*T/tb</code>. Regarding <code>iter_size</code>: this allows to average gradients over several training mini batches, see this thread fro more information.

Caffe | solver.prototxt values setting strategy

Tags:

machine-learning

neural-network

deep-learning

caffe

conv-neural-network

On Caffe, I am trying to implement a Fully Convolution Network for semantic segmentation. I was wondering is there a specific strategy to set up your 'solver.prototxt' values for the following hyper-parameters:

test_iter
test_interval
iter_size
max_iter

Does it depend on the number of images you have for your training set? If so, how?

505

asked Nov 18 '15 13:11

Abhilash Panigrahi

1 Answers

In order to set these values in a meaningful manner, you need to have a few more bits of information regarding your data:

1. Training set size the total number of training examples you have, let's call this quantity T.
2. Training batch size the number of training examples processed together in a single batch, this is usually set by the input data layer in the 'train_val.prototxt'. For example, in this file the train batch size is set to 256. Let's denote this quantity by tb.
3. Validation set size the total number of examples you set aside for validating your model, let's denote this by V.
4. Validation batch size value set in batch_size for the TEST phase. In this example it is set to 50. Let's call this vb.

Now, during training, you would like to get an un-biased estimate of the performance of your net every once in a while. To do so you run your net on the validation set for test_iter iterations. To cover the entire validation set you need to have test_iter = V/vb.
How often would you like to get this estimation? It's really up to you. If you have a very large validation set and a slow net, validating too often will make the training process too long. On the other hand, not validating often enough may prevent you from noting if and when your training process failed to converge. test_interval determines how often you validate: usually for large nets you set test_interval in the order of 5K, for smaller and faster nets you may choose lower values. Again, all up to you.

In order to cover the entire training set (completing an "epoch") you need to run T/tb iterations. Usually one trains for several epochs, thus max_iter=#epochs*T/tb.

Regarding iter_size: this allows to average gradients over several training mini batches, see this thread fro more information.

180

answered Sep 20 '22 18:09

Shai

Related questions
                            
                                Add trend line to pandas
                            
                                Feature importances - Bagging, scikit-learn
                            
                                Plot k-Nearest-Neighbor graph with 8 features?
                            
                                WEKA Tutorials / Examples for a Newbie [closed]
                            
                                Use Azure Machine learning to detect symbol within an image
                            
                                How to avoid overfitting on a simple feed forward network
                            
                                Supervised Latent Dirichlet Allocation for Document Classification?
                            
                                How to fit a polynomial curve to data using scikit-learn?
                            
                                Compare column names of Pandas Dataframe
                            
                                Artificial Intelligence Methods to Detect Cheating in Games [closed]
                            
                                Can evolutionary computation be a method of reinforcement learning?
                            
                                python: How to use POS (part of speech) features in scikit learn classfiers (SVM) etc
                            
                                How to create a keras layer with a custom gradient in TF2.0?
                            
                                weka.core.UnassignedDatasetException when creating an unlabeled instance
                            
                                How to read binary files in Python using NumPy?
                            
                                Keras + tensorflow gives the error "no attribute 'control_flow_ops'"
                            
                                Keras custom decision threshold for precision and recall
                            
                                Using word2vec to classify words in categories
                            
                                Neural Network based ranking of documents
                            
                                Fit multivariate gaussian distribution to a given dataset

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With