I've modified the Caffe MNIST example to classify 3 classes of image. One thing I noticed was that if I specify the number of output layers as 3, then my test accuracy drops horribly - down to the low 40% range. However, if I +1 and have 4 output layers, the result is in the 95% range. I added an extra class of images to my dataset (so 4 classes) and noticed the same thing - if the number of output layers were the same as the number of classes, then the result was horrible, if it was the same +1, then it worked really well. <pre class="prettyprint"><code> inner_product_param { num_output: 3 weight_filler { type: "xavier" } bias_filler { type: "constant" </code></pre> Does anyone know why this is? I've noticed that when I use the model I train with the C++ example code on an image from my test set then it will complain that I've told it that there are 4 classes present and I've only supplied labels for 3 in my labels file. If I invent a label and add it to the file, I can get the program to run, but then it just returns one of the classes with a probability of 1.0 no matter what image I give it.

It is important to notice that when fine-tuning and/or changing the number of labels the input labels must always start from 0, as they are used as indices into the output probability vector when computing the loss. Thus, if you have <pre class="prettyprint"><code> inner_product_params { num_output: 3 } </code></pre> You must have training labels 0,1 and 2 only. If you use <code>num_output: 3</code> with labels 1,2,3 caffe is unable to represent label 3 and in fact has a redundant line corresponding to label 0 that is left unused. As you observed, when changing to <code>num_output: 4</code> caffe is again able to represent label 3 and the results improved, but still you have an unused row in the parameters matrix.

Caffe output layer number accuracy

Tags:

machine-learning

neural-network

deep-learning

computer-vision

caffe

I've modified the Caffe MNIST example to classify 3 classes of image. One thing I noticed was that if I specify the number of output layers as 3, then my test accuracy drops horribly - down to the low 40% range. However, if I +1 and have 4 output layers, the result is in the 95% range.
I added an extra class of images to my dataset (so 4 classes) and noticed the same thing - if the number of output layers were the same as the number of classes, then the result was horrible, if it was the same +1, then it worked really well.

  inner_product_param {
    num_output: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"

Does anyone know why this is? I've noticed that when I use the model I train with the C++ example code on an image from my test set then it will complain that I've told it that there are 4 classes present and I've only supplied labels for 3 in my labels file. If I invent a label and add it to the file, I can get the program to run, but then it just returns one of the classes with a probability of 1.0 no matter what image I give it.

337

asked Aug 27 '15 10:08

Jack Simpson

1 Answers

It is important to notice that when fine-tuning and/or changing the number of labels the input labels must always start from 0, as they are used as indices into the output probability vector when computing the loss.
Thus, if you have

 inner_product_params {
   num_output: 3
 }

You must have training labels 0,1 and 2 only.

If you use num_output: 3 with labels 1,2,3 caffe is unable to represent label 3 and in fact has a redundant line corresponding to label 0 that is left unused.
As you observed, when changing to num_output: 4 caffe is again able to represent label 3 and the results improved, but still you have an unused row in the parameters matrix.

154

answered Sep 24 '22 22:09

Shai

Related questions
                            
                                keras usage of the Activation layer instead of activation parameter
                            
                                Core ML model conversion fails with "Unable to infer input name and dimensions"
                            
                                How to extract False Positive, False Negative from a confusion matrix of multiclass classification
                            
                                How to conditionally assign values to tensor [masking for loss function]?
                            
                                Tensorflow Error: "Label IDs must < n_classes", but my Label IDs appear to meet this requirement already
                            
                                How to split a model trained in keras?
                            
                                Handle invalid/corrupted image files in ImageDataGenerator.flow_from_directory in Keras
                            
                                XGBModel' object has no attribute 'evals_result_'
                            
                                How to train a neural network model with bert embeddings instead of static embeddings like glove/fasttext?
                            
                                Regarding odd image dimensions in Pytorch
                            
                                How inverting the dropout compensates the effect of dropout and keeps expected values unchanged?
                            
                                How are the TokenEmbeddings in BERT created?
                            
                                Balanced Accuracy Score in Tensorflow
                            
                                Display Pytorch tensor as image using Matplotlib
                            
                                Amazon EC2 vs PiCloud [closed]
                            
                                How to deal with missing attribute values in C4.5 (J48) decision tree?
                            
                                Special characters in countVectorizer Scikit-learn
                            
                                How to obtain the training error in svm of Scikit-learn?
                            
                                How do I detect if a photo is a poster (not realistic)?
                            
                                How do I do classification using TfidfVectorizer plus metadata in practice?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With