How to penalize False Negatives more than False Positives

Tags:

From the business perspective, false negatives lead to about tenfold higher costs (real money) than false positives. Given my standard binary classification models (logit, random forest, etc.), how can I incorporate this into my model?

Do I have to change (weight) the loss function in favor of the 'preferred' error (FP) ? If so, how to do that?

598

asked Mar 07 '18 11:03

ivegotaquestion

1 Answers

There are several options for you:

As suggested in the comments, class_weight should boost the loss function towards the preferred class. This option is supported by various estimators, including sklearn.linear_model.LogisticRegression, sklearn.svm.SVC, sklearn.ensemble.RandomForestClassifier, and others. Note there's no theoretical limit to the weight ratio, so even if 1 to 100 isn't strong enough for you, you can go on with 1 to 500, etc.
You can also select the decision threshold very low during the cross-validation to pick the model that gives highest recall (though possibly low precision). The recall close to 1.0 effectively means false_negatives close to 0.0, which is what to want. For that, use sklearn.model_selection.cross_val_predict and sklearn.metrics.precision_recall_curve functions:
```
y_scores = cross_val_predict(classifier, x_train, y_train, cv=3,
                             method="decision_function")

precisions, recalls, thresholds = precision_recall_curve(y_train, y_scores)
```
If you plot the precisions and recalls against the thresholds, you should see the picture like this:

After picking the best threshold, you can use the raw scores from classifier.decision_function() method for your final classification.

Finally, try not to over-optimize your classifier, because you can easily end up with a trivial const classifier (which is obviously never wrong, but is useless).

197

answered Oct 04 '22 13:10

Maxim

Related questions
                            
                                PyCharm "Run configuration" asking for "script parameters"
                            
                                How to specify long url patterns using Regex so that they follow PEP8 guidelines
                            
                                Install pyyaml using pip/Add PyYaml as pip dependency
                            
                                ctx parameter in multiprocessing.Queue
                            
                                Print all POST request parameters without knowing their names
                            
                                pytest capsys: checking output AND getting it reported?
                            
                                Problems in implementing Horner's method in Python
                            
                                Splines with Python (using control knots and endpoints)
                            
                                Python: TypeError: Pickling an AuthenticationString object is disallowed for security reasons
                            
                                An array field in scrapy.Item
                            
                                how can I flatten an 2d numpy array, which has different length in the second axis?
                            
                                how to deactivate virtualenv from a bash script
                            
                                Overriding dict.update() method in subclass to prevent overwriting dict keys
                            
                                Asyncio + aiohttp - redis Pub/Sub and websocket read/write in single handler
                            
                                Asyncio RuntimeError: Event Loop is Closed
                            
                                LINK : fatal error LNK1104: cannot open file 'python27.lib'
                            
                                Fatal error C1083: Cannot open include file: 'openssl/opensslv.h'
                            
                                Python Pandas read_excel dtype str replace nan by blank ('') when reading or when writing via to_csv
                            
                                What is the difference between PyCUDA and NumbaPro CUDA Python?
                            
                                Python numpy.corrcoef() RuntimeWarning: invalid value encountered in true_divide c /= stddev[:, None]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to penalize False Negatives more than False Positives

Tags:

python

machine-learning

scikit-learn

ivegotaquestion

People also ask

1 Answers

Maxim

Recent Activity

Donate For Us