compute maximum f1 score using precision_recall_curve?

Tags:

For a simple binary classification problem, I would like to find what threshold setting maximizes the f1 score, which is the harmonic mean of precision and recall. Is there any built-in in scikit learn that does this? Right now, I am simply calling

precision, recall, thresholds = precision_recall_curve(y_test, y_test_predicted_probas)

And then, I can compute the f1 score using the information at each index in the triplet of arrays:

curr_f1 = compute_f1(precision[index], recall[index])

Is there a better way of doing this, or is this how the library was intended to be used? Thanks.

884

asked Jul 16 '19 15:07

information_interchange

2 Answers

After calculating the precision, recall and threshold scores you get NumPy arrays.
Just use the NumPy functions to find the threshold that maximizes the F1-Score:

f1_scores = 2*recall*precision/(recall+precision)
print('Best threshold: ', thresholds[np.argmax(f1_scores)])
print('Best F1-Score: ', np.max(f1_scores))

108

answered Sep 23 '22 08:09

Mike Alexander Doepking

Sometimes precision_recall_curve picks a few thresholds that are too high for the data so you end up with points where both precision and recall are zero. This can result in nans when computing F1 scores. To ensure correct output, use np.divide to only divide where the denominator is nonzero:

precision, recall, thresholds = precision_recall_curve(y_test, y_test_predicted_probas)
numerator = 2 * recall * precision
denom = recall + precision
f1_scores = np.divide(numerator, denom, out=np.zeros_like(denom), where=(denom!=0))
max_f1 = np.max(f1_scores)
max_f1_thresh = thresholds[np.argmax(f1_scores)]

answered Sep 20 '22 08:09

Craig Bidstrup

Related questions
                            
                                Modify seaborn line relplot legend title
                            
                                Dataflow/apache beam - how to access current filename when passing in pattern?
                            
                                Rename the less frequent categories by "OTHER" python
                            
                                Python error when building Python package Docker Image
                            
                                Percentage of array between values
                            
                                AttributeError: 'int' object has no attribute 'lower' in TFIDF and CountVectorizer
                            
                                Parallel loading of Input Files in Pandas Dataframe
                            
                                How to execute file.py on HTML button press using Django?
                            
                                sort Persian strings for python [duplicate]
                            
                                convert Dataframe to 2d Array
                            
                                More efficient method of finding minimum sum after k operations
                            
                                How To Call Postgres 11 Stored Procedure From Python
                            
                                Could not find a version that satisfies the requirement flask (from versions: ) No matching distribution found for flask
                            
                                Sum only numeric columns in pandas
                            
                                What is the process "python3 unattended upgrade shutdown"?
                            
                                Storing OAuth Token in Python Library
                            
                                Is it possible to sort a list with reduce?
                            
                                `try ... except not` construction
                            
                                COCO api evaluation for subset of classes
                            
                                Sum column based on another column in Pandas DataFrame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

compute maximum f1 score using precision_recall_curve?

Tags:

python

statistics

classification

scikit-learn

precision-recall

information_interchange

People also ask

2 Answers

Mike Alexander Doepking

Craig Bidstrup

Recent Activity

Donate For Us