I am asking a follow-up question as suggested from my previous post - Good ROC curve but poor precision-recall curve. I am only using the default setting with Python scikit-learn. It seems like the optimization is on AUC-ROC, but I am more interested in optimizing precision-recall. The following is my codes.
# Get ROC
y_score = classifierUsed2.decision_function(X_test)
false_positive_rate, true_positive_rate, thresholds = roc_curve(y_test, y_score)
roc_auc = auc(false_positive_rate, true_positive_rate)
print 'AUC-'+ethnicity_tar+'=',roc_auc
# Plotting
ax1.plot(false_positive_rate, true_positive_rate, c=color, label=('AUC-'+ethnicity_tar+'= %0.2f'%roc_auc))
ax1.plot([0,1],[0,1], color='lightgrey', linestyle='--')
ax1.legend(loc='lower right', prop={'size':8})
# Get P-R pairs
precision, recall, prThreshold = precision_recall_curve(y_test, y_score)
# Plotting
ax2.plot(recall, precision, c=color, label=ethnicity_tar)
ax2.legend(loc='upper right', prop={'size':8})
Where and how do I insert python codes to change the setting so I can optimize the precision-recall?
There are in fact two questions in your one:
I will answer them in turn:
1. The measure of quality of precision-recall curve is average precision. This average precision equals the exact area under not-interpolated (that is, piecewise constant) precision-recall curve.
2. To maximize average precision, you can only tune hyperparameters of your algorithm. You can do it with GridSearchCV
, if you set scoring='average_precision'
. Or you can find optimal hyperparameters manually or with some other tuning technique.
This is generally impossible to optimize average precision directly (during the model fitting), but there are some exceptions. E.g. this article describes an SVM that maximizes average precision.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With