In scikit's precision_recall_curve, why does thresholds have a different dimension from recall and precision?

Question

I want to see how precision and recall vary with the threshold (not just with each other)

model = RandomForestClassifier(500, n_jobs = -1);  
model.fit(X_train, y_train);  
probas = model.predict_proba(X_test)[:, 1]  
precision, recall, thresholds = precision_recall_curve(y_test, probas)  
print len(precision)   
print len(thresholds)

Returns:

283  
282

I can, therefore, not plot them together. Any clues as to why this might be the case?

Suvam · Accepted Answer

For this problem, the last precision and the recall value should be ignored The last precision and recall values are always 1. and 0. respectively and do not have a corresponding threshold.

For example here is a solution :

def plot_precision_recall_vs_threshold(precisions, recall, thresholds): 
    fig = plt.figure(figsize= (8,5))
    plt.plot(thresholds, precisions[:-1], "b--", label="Precision")
    plt.plot(thresholds, recall[:-1], "g-", label="Recall")
    plt.legend()

plot_precision_recall_vs_threshold(precision, recall, thresholds)

These values should are there so that the plot starts at the y-axis (x=0) when you are plotting precision vs recall.

In scikit's precision_recall_curve, why does thresholds have a different dimension from recall and precision?

Tags:

python

python-2.7

scikit-learn

precision-recall

sapo_cosmico

1 Answers

Suvam

Recent Activity

Donate For Us

In scikit's precision_recall_curve, why does thresholds have a different dimension from recall and precision?

Tags:

python

python-2.7

scikit-learn

precision-recall

sapo_cosmico

1 Answers

Suvam

Related questions

Recent Activity

Donate For Us