In this simplified example, I've trained a learner with GridSearchCV. I would like to return the confusion matrix of the best learner when predicting on the full set X.
lr_pipeline = Pipeline([('clf', LogisticRegression())])
lr_parameters = {}
lr_gs = GridSearchCV(lr_pipeline, lr_parameters, n_jobs=-1)
lr_gs = lr_gs.fit(X,y)
print lr_gs.confusion_matrix # Would like to be able to do this
Thanks
I found this question while searching for how to calculate the confusion matrix while fitting Sci-kit Learn's GridSearchCV
. I was able to find a solution by defining a custom scoring function, although it's somewhat kludgy. I'm leaving this answer for anyone else who makes a similar search.
As mentioned by @MLgeek and @bugo99iot, the accepted answer by @Sudeep Juvekar isn't really satisfactory. It offers a literal answer to original question as asked, but it's not usually the case that a machine learning practitioner would be interested in the confusion matrix of a fitted model on its training data. It is more typically of interest to know how well a model generalizes to data it hasn't seen.
To use a custom scoring function in GridSearchCV
you will need to import the Scikit-learn helper function make_scorer
.
from sklearn.metrics import make_scorer
The custom scoring function looks like this
def _count_score(y_true, y_pred, label1=0, label2=1):
return sum((y == label1 and pred == label2)
for y, pred in zip(y_true, y_pred))
For a given pair of labels, (label1, label2)
, it calculates the number of examples where the true value of y
is label1
and the predicted value of y
is label2
.
To start, find all of the labels in the training data
all_labels = sorted(set(y))
The optional argument scoring
of GridSearchCV
can receive a dictionary mapping strings to scorers. make_scorer
can take a scoring function along with bindings for some of its parameters and produce a scorer, which is a particular type of callable that is used for scoring in GridSearchCV
, cross_val_score
, etc. Let's build up this dictionary for each pair of labels.
scorer = {}
for label1 in all_labels:
for label2 in all_labels:
count_score = make_scorer(_count_score, label1=label1,
label2=label2)
scorer['count_%s_%s' % (label1, label2)] = count_score
You'll also want to add any additional scoring functions you're interested in. To avoid getting into the subtleties of scoring for multi-class classification let's add a simple accuracy score.
# import placed here for the sake of demonstration.
# Should be imported alongside make_scorer above
from sklearn.metrics import accuracy_score
scorer['accuracy'] = make_scorer(accuracy_score)
We can now fit GridSearchCV
num_splits = 5
lr_gs = GridSearchCV(lr_pipeline, lr_parameters, n_jobs=-1,
scoring=scorer, refit='accuracy',
cv=num_splits)
refit='accuracy'
tells GridSearchCV
that it should judge by best accuracy score to decide on the parameters to use when refitting. In the case where you are passing a dictionary of multiple scorers to scoring
, if you do not pass a value to the optional argument refit
, GridSearchCV
will not refit the model on all training data. We've explicitly set the number of splits because we'll need to know this later.
Now, for each of the training folds used in cross-validation, essentially what we've done is calculate the confusion matrix on the respective test folds. The test folds do not overlap and cover the entire space of data, we've therefore made predictions for each data point in X
in such a way that the prediction for each point does not depend on the associated target label for that point.
We can add up the confusion matrices associated to the test folds to get something useful that gives information on how well the model generalizes. It can also be interesting to look at the confusion matrices for the test folds separately and do stuff like calculate variances.
We're not done yet though. We need to actually pull out the confusion matrix for the best estimator. In this example, the cross validation results will be stored in the dictionary lr_gs.cv_results
. First let's get the index in the results corresponding to the best set of parameters
best_index = lr_gs.cv_results['rank_test_accuracy'] - 1
If you are using a different metric to decide upon the best parameters, substitute for 'accuracy' the key you are using for the associated scorer in the scoring dictionary passed to GridSearchCV
.
In my own application I chose to store the confusion matrix as a nested dictionary.
confusion = defaultdict(lambda: defaultdict(int))
for label1 in all_labels:
for label2 in all_labels
for i in range(num_splits):
key = 'split%s_test_count_%s_%s' % (i, label1, label2)
val = int(lr_gs.cv_results[key][best_index])
confusion[label1][label2] += val
confusion = {key: dict(value) for key, value in confusion.items()}
There's some stuff to unpack here. defaultdict(lambda: defaultdict(int))
constructs a nested defaultdict
; a defaultdict
of defaultdict
of int
(if you're copying and pasting, don't forget to add from collections import defaultdict
at the top of your file). The last line of this snippet is used to turn confusion
into a regular dict
of dict
of int
. Never leave defaultdict
s lying around when they are no longer needed.
You will likely want to store your confusion matrix in a different way. The key fact is that the confusion matrix entry for the pair of labels 'label1'
, 'label2'
for test fold i
is stored in
lr_gs.cv_results['spliti_label1_label2'][best_index]
See here for an example of this confusion matrix calculation used in practice. I think it's a bit of a code smell to rely on the specific format of the keys in the cv_results
dictionary but this does work, at least as of the day of this post.
You will first need to predict using best estimator in your GridSerarchCV
. A common method to use is GridSearchCV.decision_function()
, But for your example, decision_function
returns class probabilities from LogisticRegression
and does not work with confusion_matrix
. Instead, find best estimator using lr_gs
and predict the labels using that estimator.
y_pred = lr_gs.best_estimator_.predict(X)
Finally, use sklearn's confusion_matrix
on real and predicted y
from sklearn.metrics import confusion_matrix
print confusion_matrix(y, y_pred)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With