I am trying to understand the basics of caffe, in particular to use with python. My understanding is that the model definition (say a given neural net architecture) must be included in the <code>'.prototxt'</code> file. And that when you train the model on data using the <code>'.prototxt'</code>, you save the weights/model parameters to a <code>'.caffemodel'</code> file Also, there is a difference between the <code>'.prototxt'</code> file used for training (which includes learning rate and regularization parameters) and the one used for testing/deployment, which does not include them. Questions: <ol> <li>is it correct that the <code>'.prototxt'</code> is the basis for training and that the <code>'.caffemodel'</code> is the result of training (weights), using the <code>'.prototxt'</code> on the training data?</li> <li>is it correct that there is a <code>'.prototxt'</code> for training and one for testing, and that there are only slight differences (learning rate and regularization factors on training), but that the nn architecture (assuming you use neural nets) is the same?</li> </ol> Apologies for such basic questions and possibly some very incorrect assumptions, I am doing some online research and the lines above summarize my understanding to date.

Let's take a look at one of the examples provided with BVLC/caffe: <code>bvlc_reference_caffenet</code>. You'll notice that in fact there are 3 <code>'.prototxt'</code> files: <ul> <li> <code>train_val.prototxt</code>: this file describe the net architecture for the training phase. </li> <li> <code>depoly.prototxt</code>: this file describe the net architecture for test time ("deployment"). </li> <li> <code>solver.prototxt</code>: this file is very small and contains "meta parameters" for training. For example, the learning rate policy, regulariztion etc.</li> </ul> The net architecture represented by <code>train_val.prototxt</code> and <code>deploy.prototxt</code> should be mostly similar. There are few main difference between the two: <ul> <li> Input data: during training one usually use a predefined set of inputs for training/validation. Therefore, <code>train_val</code> usually contains an explicit input layer, e.g., <code>"HDF5Data"</code> layer or a <code>"Data"</code> layer. On the other hand, <code>deploy</code> usually does not know in advance what inputs it will get, it only contains a statement: <pre class="prettyprint"><code>input: "data" input_shape { dim: 10 dim: 3 dim: 227 dim: 227 } </code></pre> that declares what input the net expects and what should be its dimensions. Alternatively, One can put an <code>"Input"</code> layer: <pre class="prettyprint"><code>layer { name: "input" type: "Input" top: "data" input_param { shape { dim: 10 dim: 3 dim: 227 dim: 227 } } } </code></pre> </li> <li>Input labels: during training we supply the net with the "ground truth" expected outputs, this information is obviously not available during <code>deploy</code>. </li> <li>Loss layers: during training one must define a loss layer. This layer tells the solver in what direction it should tune the parameters at each iteration. This loss compares the net's current prediction to the expected "ground truth". The gradient of the loss is back-propagated to the rest of the net and this is what drives the learning process. During <code>deploy</code> there is no loss and no back-propagation.</li> </ul> In caffe, you supply a <code>train_val.prototxt</code> describing the net, the train/val datasets and the loss. In addition, you supply a <code>solver.prototxt</code> describing the meta parameters for training. The output of the training process is a <code>.caffemodel</code> binary file containing the trained parameters of the net. Once the net was trained, you can use the <code>deploy.prototxt</code> with the <code>.caffemodel</code> parameters to predict outputs for new and unseen inputs.

deep learning - a number of naive questions about caffe

Tags:

python

neural-network

deep-learning

caffe

pycaffe

I am trying to understand the basics of caffe, in particular to use with python.

My understanding is that the model definition (say a given neural net architecture) must be included in the '.prototxt' file.

And that when you train the model on data using the '.prototxt', you save the weights/model parameters to a '.caffemodel' file

Also, there is a difference between the '.prototxt' file used for training (which includes learning rate and regularization parameters) and the one used for testing/deployment, which does not include them.

Questions:

is it correct that the '.prototxt' is the basis for training and that the '.caffemodel' is the result of training (weights), using the '.prototxt' on the training data?
is it correct that there is a '.prototxt' for training and one for testing, and that there are only slight differences (learning rate and regularization factors on training), but that the nn architecture (assuming you use neural nets) is the same?

Apologies for such basic questions and possibly some very incorrect assumptions, I am doing some online research and the lines above summarize my understanding to date.

648

asked Jan 24 '16 00:01

Alejandro Simkievich

1 Answers

Let's take a look at one of the examples provided with BVLC/caffe: bvlc_reference_caffenet.
You'll notice that in fact there are 3 '.prototxt' files:

train_val.prototxt: this file describe the net architecture for the training phase.
depoly.prototxt: this file describe the net architecture for test time ("deployment").
solver.prototxt: this file is very small and contains "meta parameters" for training. For example, the learning rate policy, regulariztion etc.

The net architecture represented by train_val.prototxt and deploy.prototxt should be mostly similar. There are few main difference between the two:

Input data: during training one usually use a predefined set of inputs for training/validation. Therefore, train_val usually contains an explicit input layer, e.g., "HDF5Data" layer or a "Data" layer. On the other hand, deploy usually does not know in advance what inputs it will get, it only contains a statement:
```
input: "data"
input_shape {
  dim: 10
  dim: 3
  dim: 227
  dim: 227
}
```
that declares what input the net expects and what should be its dimensions.
Alternatively, One can put an "Input" layer:
```
layer {
  name: "input"
  type: "Input"
  top: "data"
  input_param { shape { dim: 10 dim: 3 dim: 227 dim: 227 } }
}
```
Input labels: during training we supply the net with the "ground truth" expected outputs, this information is obviously not available during deploy.
Loss layers: during training one must define a loss layer. This layer tells the solver in what direction it should tune the parameters at each iteration. This loss compares the net's current prediction to the expected "ground truth". The gradient of the loss is back-propagated to the rest of the net and this is what drives the learning process. During deploy there is no loss and no back-propagation.

In caffe, you supply a train_val.prototxt describing the net, the train/val datasets and the loss. In addition, you supply a solver.prototxt describing the meta parameters for training. The output of the training process is a .caffemodel binary file containing the trained parameters of the net.
Once the net was trained, you can use the deploy.prototxt with the .caffemodel parameters to predict outputs for new and unseen inputs.

117

answered Sep 28 '22 12:09

Shai

Related questions
                            
                                How to save / serialize a trained model in theano?
                            
                                Get value of a form input by ID python/flask
                            
                                How to run a command only if is the master branch in travis-ci?
                            
                                linear regression for timeseries python (numpy or pandas)
                            
                                How to annotate seaborn pairplots?
                            
                                Why is adding to or removing from the middle of a collections.deque slower than lookup there?
                            
                                How to customize a scatter matrix to see all titles?
                            
                                Load part of a json in python
                            
                                Solving a system of odes (with changing constant!) using scipy.integrate.odeint?
                            
                                compressed files bigger in h5py
                            
                                How to generate many interaction terms in Pandas?
                            
                                Remove axis scale
                            
                                Flip non-zero values along each row of a lower triangular numpy array
                            
                                How to get all alpha values of scikit-learn SVM classifier?
                            
                                How to generate one hot encoding for DNA sequences?
                            
                                How to get "subsoups" and concatenate/join them?
                            
                                How do I check if code is being run from a nose-test?
                            
                                Extracting the top-k value-indices from a 1-D Tensor
                            
                                How to remove duplicate dictionary based on selected keys from a list of dictionaries in Python?
                            
                                Use pip or conda to manage packages? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With