Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caffe output layer number accuracy

I've modified the Caffe MNIST example to classify 3 classes of image. One thing I noticed was that if I specify the number of output layers as 3, then my test accuracy drops horribly - down to the low 40% range. However, if I +1 and have 4 output layers, the result is in the 95% range.
I added an extra class of images to my dataset (so 4 classes) and noticed the same thing - if the number of output layers were the same as the number of classes, then the result was horrible, if it was the same +1, then it worked really well.

  inner_product_param {
    num_output: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"

Does anyone know why this is? I've noticed that when I use the model I train with the C++ example code on an image from my test set then it will complain that I've told it that there are 4 classes present and I've only supplied labels for 3 in my labels file. If I invent a label and add it to the file, I can get the program to run, but then it just returns one of the classes with a probability of 1.0 no matter what image I give it.

like image 337
Jack Simpson Avatar asked Aug 27 '15 10:08

Jack Simpson


People also ask

What is blob in Caffe?

A Blob is a wrapper over the actual data being processed and passed along by Caffe, and also under the hood provides synchronization capability between the CPU and the GPU. Mathematically, a blob is an N-dimensional array stored in a C-contiguous fashion. Caffe stores and communicates data using blobs.

What is Caffe model in deep learning?

Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR) and by community contributors. Yangqing Jia created the project during his PhD at UC Berkeley. Caffe is released under the BSD 2-Clause license.


1 Answers

It is important to notice that when fine-tuning and/or changing the number of labels the input labels must always start from 0, as they are used as indices into the output probability vector when computing the loss.
Thus, if you have

 inner_product_params {
   num_output: 3
 }

You must have training labels 0,1 and 2 only.

If you use num_output: 3 with labels 1,2,3 caffe is unable to represent label 3 and in fact has a redundant line corresponding to label 0 that is left unused.
As you observed, when changing to num_output: 4 caffe is again able to represent label 3 and the results improved, but still you have an unused row in the parameters matrix.

like image 154
Shai Avatar answered Sep 24 '22 22:09

Shai