How best to deal with "None of the above" in Image Classification?

Tags:

This seems to be a fundamental question which some of you out there must have an opinion on. I have an image classifier implemented in CNTK with 48 classes. If the image does not match any of the 48 classes very well, then I'd like to be able to conclude that it was not among these 48 image types. My original idea was simply that if the highest output of the final Softmax layer was low, I would be able to conclude that the test image matched none well. While I occasionally see this occur, in most testing, Softmax still produces a very high (and mistaken) result when handed an 'unknown image type'. But maybe my network is 'over fit' and if it wasn't, my original idea would work fine. What do you think? Any way to define a 49-th class called 'none-of-the-above'?

817

asked Apr 24 '17 02:04

Tullhead

2 Answers

You really have these two options indeed--thresholding the posterior probabilities (softmax values), and adding a garbage class.

In my area (speech), both approaches are their place:

If "none of the above" inputs are of the same nature as the "above" (e.g. non-grammatical inputs), thresholding works fine. Note that the posterior probability for a class is equal to one minus an estimate of the error rate for choosing this class. Rejecting anything with posterior < 50% would be rejecting all cases where you are more likely wrong than right. As long as your none-of-the-above classes are of similar nature, the estimate may be accurate enough to make this correct for them as well.

If "none of the above" inputs are of similar nature but your number of classes is very small (e.g. 10 digits), or if the inputs are of a totally different nature (e.g. a sound of a door slam or someone coughing), thresholding typically fails. Then, one would train a "garbage model." In our experience, it is OK to include the training data for the correct classes. Now the none-of-the-above class may match a correct class as well. But that's OK as long as the none-of-the-above class is not overtrained--its distribution will be much flatter, and thus even if it matches a known class, it will match it with a lower score and thus not win against the actual known class' softmax output.

In the end, I would use both. Definitely use a threshold (to catch the cases that the system can rule out) and use a garbage model, which I would just train it on whatever you have. I would expect that including the correct examples in training will not harm, even if it is the only data you have (please check the paper Anton posted for whether that applies to image as well). It may also make sense to try to synthesize data, e.g. by randomly combining patches from different images.

194

answered Sep 18 '22 14:09

Frank Seide MSFT

I agree with you that this is a key question, but I am not aware of much work in that area either.

There's one recent paper by Zhang and LeCun, that addresses the question for image classification in particular. They use large quantities of unlabelled data to create an additional "none of the above" class. The catch though is that, in some cases, their unlabelled data is not completely unlabelled, and they have means of removing "unlabelled" images that are actually in one of their labelled classes. Having said that, the authors report that apart from solving the "none of the above" problem, they even see performance gains even on their test sets.

As for fitting something post-hoc, just by looking at the outputs of the softmax, I can't provide any pointers.

answered Sep 21 '22 14:09

Anton Schwaighofer

Related questions
                            
                                Spark K-fold Cross Validation
                            
                                How to fix ROC curve with points below diagonal?
                            
                                Why do scala maven artifacts have an artifact for each scala version instead of a classifier per scala version?
                            
                                Neural network for multi label classification with large number of classes outputs only zero
                            
                                Binarization in Natural Language Processing
                            
                                How to give a constant input to keras
                            
                                Randomness in Artificial Intelligence & Machine Learning
                            
                                Balanced Random Forest in scikit-learn (python)
                            
                                Keras class_weight in multi-label binary classification
                            
                                How can I improve the efficiency and/or performance of my relatively simple Java counting method?
                            
                                How to use Weka for predicting results
                            
                                Python OpenCV SVM implementation
                            
                                Can the Precision, Recall and F1 be the same value?
                            
                                Choose the right classification algorithm. Linear or non-linear? [closed]
                            
                                "RTextTools" create_matrix got an error
                            
                                Classification report with Nested Cross Validation in SKlearn (Average/Individual values)
                            
                                How to obtain information gain from a scikit-learn DecisionTreeClassifier?
                            
                                How to represent text documents as feature vectors for text classification?
                            
                                Miminum requirements for Google tensorflow image classifier
                            
                                Does sklearn support a cost matrix?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How best to deal with "None of the above" in Image Classification?

Tags:

classification

softmax

cntk

Tullhead

People also ask

2 Answers

Frank Seide MSFT

Anton Schwaighofer

Recent Activity

Donate For Us