Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Euclidean Loss Layer in Caffe

I am currently trying to implement my own loss layer in caffe, and while attempting to do so, am using other layers as a reference. One thing that puzzles me, however, is the use of top[0]->cpu_diff() in Backward_cpu. I will be using the EuclideanLossLayer as a reference. Here are my questions

  • It is my understanding that top[0]->cpu_diff() holds the error derivative from the next layer, but what if there is no other layer, how is it initialised? since it is used in EuclideanLossLayer without performing any checks:

    const Dtype alpha = sign * top[0]->cpu_diff()[0] / bottom[i]->num();
    
  • Again, in the EuclideanLossLayer, the derivative for the error with respect to the activations is calculated using the following code snippet:

    const Dtype alpha = sign * top[0]->cpu_diff()[0] / bottom[i]->num();
    caffe_cpu_axpby(
      bottom[i]->count(),              // count
      alpha,                              // alpha
      diff_.cpu_data(),                   // a
      Dtype(0),                           // beta
      bottom[i]->mutable_cpu_diff());  // b
    

    If my first assumption is correct, and top[0]->cpu_diff() does indeed hold the error derivative for the layer above, why do we only use the first element i.e. top[0]->cpu_diff()[0] as opposed to multiplying by the whole vector i.e. top[0]->cpu_diff()?

like image 846
BitRiver Avatar asked Jun 28 '15 11:06

BitRiver


1 Answers

For loss layers, there is no next layer, and so the top diff blob is technically undefined and unused - but Caffe is using this preallocated space to store unrelated data: Caffe supports multiplying loss layers with a user-defined weight (loss_weight in the prototxt), this information (a single scalar floating point number) is stored in the first element of the diff array of the top blob. That's why you'll see in every loss layer, that they multiply by that amount to support that functionality. This is explained in Caffe's tutorial about the loss layer.

This weight is usually used to add auxiliary losses to the network. You can read more about it in Google's Going Deeper with Convoltions or in Deeply-Supervised Nets.

like image 113
Or Sharir Avatar answered Oct 22 '22 21:10

Or Sharir