Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you fix the false negative rate in a classifier in scikit learn

I am using a Random Forest classifer in scikit learn with an imbalanced data set of two classes. I am much more worried about false negatives than false positives. Is it possible to fix the false negative rate (to, say, 1%) and ask scikit to optimize the false positive rate somehow?

If this classifier doesn't support it, is there another classifier that does?

like image 550
graffe Avatar asked Sep 17 '15 18:09

graffe


People also ask

How do you reduce false negatives in Python?

To minimize the number of False Negatives (FN) or False Positives (FP) we can also retrain a model on the same data with slightly different output values more specific to its previous results. This method involves taking a model and training it on a dataset until it optimally reaches a global minimum.

How can you reduce false positives in binary classification?

Another common method used to decrease cases like false negatives or false positives is changing the decision boundary line. The basic decision boundary line in binary classification models is 0.5. When the y value is greater than 0.5, the prediction is considered True.

What is false negative in classification?

A false positive is an outcome where the model incorrectly predicts the positive class. And a false negative is an outcome where the model incorrectly predicts the negative class. In the following sections, we'll look at how to evaluate classification models using metrics derived from these four outcomes.


2 Answers

I believe the problem of class imbalance in sklearn can be partially resolved by using the class_weight parameter.

this parameter is either a dictionary, where each class is assigned a uniform weight, or is a string that tells sklearn how to build this dictionary. For instance, setting this parameter to 'auto', will weight each class in proportion of the inverse of its frequency.

By weighting the class that is less present with a higher amount, you can end up with 'better' results.

Classifier like like SVM or logistic regression also offer this class_weight parameter.

This Stack Overflow answer gives some other ideas on how to handle class imbalance, like under sampling and oversampling.

like image 164
DJanssens Avatar answered Sep 20 '22 14:09

DJanssens


I found this article on class imbalance problem.

http://www.chioka.in/class-imbalance-problem/

It has basically discussed the following possible solutions to summarize:

  • Cost function based approaches
  • Sampling based approaches
  • SMOTE (Synthetic Minority Over-Sampling Technique)
  • recent approaches : RUSBoost, SMOTEBagging and Underbagging

Hope It may help.

like image 33
Pappu Jha Avatar answered Sep 20 '22 14:09

Pappu Jha