What is a threshold in a Precision-Recall curve?

Tags:

I am aware of the concept of Precision as well as the concept of Recall. But I am finding it very hard to understand the idea of a 'threshold' which makes any P-R curve possible.

Imagine I have a model to build that predicts the re-occurrence (yes or no) of cancer in patients using some decent classification algorithm on relevant features. I split my data for training and testing. Lets say I trained the model using the train data and got my Precision and Recall metrics using the test data.

But HOW can I draw a P-R curve now? On what basis? I just have two values, one precision and one recall. I read that its the 'Threshold' that allows you to get several precision-recall pairs. But what is that threshold? I am still a beginner and I am unable to comprehend the very concept of the threshold.

I see in so many classification model comparisons like the one below. But how do they get those many pairs?

Model Comparison Using Precision-Recall Curve

973

asked Sep 14 '17 17:09

Mr.A

1 Answers

ROC Curves:

x-axis: False Positive Rate FPR = FP /(FP + TN) = FP / N

y-axis: True Positive Rate TPR = Recall = TP /(TP + FN) = TP / P

Precision-Recall Curves:

x-axis: Recall = TP / (TP + FN) = TP / P = TPR

y-axis: Precision = TP / (TP + FP) = TP / PP

Your cancer detection example is a binary classification problem. Your predictions are based on a probability. The probability of (not) having cancer.

In general, an instance would be classified as A, if P(A) > 0.5 (your threshold value). For this value, you get your Recall-Precision pair based on the True Positives, True Negatives, False Positives and False Negatives.

Now, as you change your 0.5 threshold, you get a different result (different pair). You can already classify a patient as 'has cancer' for P(A) > 0.3. This will decrease Precision and increase Recall. You would rather tell someone that he has cancer even though he has not, to make sure that patients with cancer are sure to get the treatment they need. This represents the intuitive trade-off between TPR and FPR or Precision and Recall or Sensitivity and Specificity.

Let's add these terms as you see them more often common in biostatistics.

Sensitivity = TP / P = Recall = TPR

Specificity = TN / N = (1 – FPR)

ROC-curves and Precision-Recall curves visualize all these possible thresholds of your classifier.

You should consider these metrics, if accuracy alone is not a suitable quality measure. Classifying all patients as 'does not have cancer' will give you the highest accuracy but the values of your ROC and Precision-Recall curves will be 1s and 0s.

answered Sep 21 '22 21:09

lnathan

Related questions
                            
                                Is F1 micro the same as Accuracy?
                            
                                How to use fit_generator with multiple inputs
                            
                                Save python random forest model to file
                            
                                How to duplicate an estimator in order to use it on multiple data sets?
                            
                                How to get a classifier's confidence score for a prediction in sklearn?
                            
                                Printing all the contents of a tensor
                            
                                Choosing from different cost function and activation function of a neural network
                            
                                Using Smote with Gridsearchcv in Scikit-learn
                            
                                Soft attention vs. hard attention
                            
                                What's the difference between LibSVM and LibLinear
                            
                                Is it possible to do multivariate multi-step forecasting using FB Prophet?
                            
                                What is weakly supervised learning (bootstrapping)?
                            
                                Maximum Likelihood Estimate pseudocode
                            
                                How does Pytorch's "Fold" and "Unfold" work?
                            
                                Request for example: Recurrent neural network for predicting next value in a sequence
                            
                                Create Bayesian Network and learn parameters with Python3.x [closed]
                            
                                Training on imbalanced data using TensorFlow
                            
                                Hyperparameter optimization for Deep Learning Structures using Bayesian Optimization
                            
                                Building a mutlivariate, multi-task LSTM with Keras
                            
                                What is a bad, decent, good, and excellent F1-measure range?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is a threshold in a Precision-Recall curve?

Tags:

machine-learning

classification

precision-recall

auc

model-comparison

Mr.A

People also ask

1 Answers

lnathan

Recent Activity

Donate For Us