I am trying to understand the basics of caffe, in particular to use with python.
My understanding is that the model definition (say a given neural net architecture) must be included in the '.prototxt'
file.
And that when you train the model on data using the '.prototxt'
, you save the weights/model parameters to a '.caffemodel'
file
Also, there is a difference between the '.prototxt'
file used for training (which includes learning rate and regularization parameters) and the one used for testing/deployment, which does not include them.
Questions:
'.prototxt'
is the basis for training and that
the '.caffemodel'
is the result of training (weights), using the
'.prototxt'
on the training data?'.prototxt'
for training and one for
testing, and that there are only slight differences (learning rate
and regularization factors on training), but that the nn
architecture (assuming you use neural nets) is the same?Apologies for such basic questions and possibly some very incorrect assumptions, I am doing some online research and the lines above summarize my understanding to date.
Caffe (Convolutional Architecture for Fast Feature Embedding) is a deep learning framework, originally developed at University of California, Berkeley. It is open source, under a BSD license. It is written in C++, with a Python interface.
Caffe has more performance than TensorFlow by 1.2 to 5 times as per internal benchmarking in Facebook. TensorFlow works well on images and sequences and voted as most-used deep learning library whereas Caffe works well on images but doesn't work well on sequences and recurrent neural networks.
Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR) and by community contributors.
TensorFlow is basically a software library for numerical computation using data flow graphs, where Caffe is a deep learning framework written in C++ that has an expression architecture easily allowing you to switch between the CPU and GPU.
Let's take a look at one of the examples provided with BVLC/caffe: bvlc_reference_caffenet
.
You'll notice that in fact there are 3 '.prototxt'
files:
train_val.prototxt
: this file describe the net architecture for the training phase. depoly.prototxt
: this file describe the net architecture for test time ("deployment"). solver.prototxt
: this file is very small and contains "meta parameters" for training. For example, the learning rate policy, regulariztion etc.The net architecture represented by train_val.prototxt
and deploy.prototxt
should be mostly similar. There are few main difference between the two:
Input data: during training one usually use a predefined set of inputs for training/validation. Therefore, train_val
usually contains an explicit input layer, e.g., "HDF5Data"
layer or a "Data"
layer. On the other hand, deploy
usually does not know in advance what inputs it will get, it only contains a statement:
input: "data"
input_shape {
dim: 10
dim: 3
dim: 227
dim: 227
}
that declares what input the net expects and what should be its dimensions.
Alternatively, One can put an "Input"
layer:
layer {
name: "input"
type: "Input"
top: "data"
input_param { shape { dim: 10 dim: 3 dim: 227 dim: 227 } }
}
deploy
. deploy
there is no loss and no back-propagation.In caffe, you supply a train_val.prototxt
describing the net, the train/val datasets and the loss. In addition, you supply a solver.prototxt
describing the meta parameters for training. The output of the training process is a .caffemodel
binary file containing the trained parameters of the net.
Once the net was trained, you can use the deploy.prototxt
with the .caffemodel
parameters to predict outputs for new and unseen inputs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With