I want to see how precision and recall vary with the threshold (not just with each other)
model = RandomForestClassifier(500, n_jobs = -1);
model.fit(X_train, y_train);
probas = model.predict_proba(X_test)[:, 1]
precision, recall, thresholds = precision_recall_curve(y_test, probas)
print len(precision)
print len(thresholds)
Returns:
283
282
I can, therefore, not plot them together. Any clues as to why this might be the case?
For this problem, the last precision and the recall value should be ignored The last precision and recall values are always 1. and 0. respectively and do not have a corresponding threshold.
For example here is a solution :
def plot_precision_recall_vs_threshold(precisions, recall, thresholds):
fig = plt.figure(figsize= (8,5))
plt.plot(thresholds, precisions[:-1], "b--", label="Precision")
plt.plot(thresholds, recall[:-1], "g-", label="Recall")
plt.legend()
plot_precision_recall_vs_threshold(precision, recall, thresholds)
These values should are there so that the plot starts at the y-axis (x=0) when you are plotting precision vs recall.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With