I am currently trying to implement my own loss layer in caffe, and while attempting to do so, am using other layers as a reference. One thing that puzzles me, however, is the use of top[0]->cpu_diff()
in Backward_cpu
. I will be using the EuclideanLossLayer
as a reference. Here are my questions
It is my understanding that top[0]->cpu_diff()
holds the error derivative from the next layer, but what if there is no other layer, how is it initialised? since it is used in EuclideanLossLayer
without performing any checks:
const Dtype alpha = sign * top[0]->cpu_diff()[0] / bottom[i]->num();
Again, in the EuclideanLossLayer
, the derivative for the error with respect to the activations is calculated using the following code snippet:
const Dtype alpha = sign * top[0]->cpu_diff()[0] / bottom[i]->num();
caffe_cpu_axpby(
bottom[i]->count(), // count
alpha, // alpha
diff_.cpu_data(), // a
Dtype(0), // beta
bottom[i]->mutable_cpu_diff()); // b
If my first assumption is correct, and top[0]->cpu_diff()
does indeed hold the error derivative for the layer above, why do we only use the first element i.e. top[0]->cpu_diff()[0]
as opposed to multiplying by the whole vector i.e. top[0]->cpu_diff()
?
For loss layers, there is no next layer, and so the top diff blob is technically undefined and unused - but Caffe is using this preallocated space to store unrelated data: Caffe supports multiplying loss layers with a user-defined weight (loss_weight in the prototxt), this information (a single scalar floating point number) is stored in the first element of the diff array of the top blob. That's why you'll see in every loss layer, that they multiply by that amount to support that functionality. This is explained in Caffe's tutorial about the loss layer.
This weight is usually used to add auxiliary losses to the network. You can read more about it in Google's Going Deeper with Convoltions or in Deeply-Supervised Nets.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With