Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the meaning of 'mean_test_score' in cv_result?

Hello I'm doing a GridSearchCV and I'm printing the result with the .cv_results_ function from scikit learn.

My problem is that when I'm evaluating by hand the mean on all the test score splits I obtain a different number compared to what it is written in 'mean_test_score'. Which is different from the standard np.mean()?

I attach here the code with the result:

n_estimators = [100]
max_depth = [3]
learning_rate = [0.1]

param_grid = dict(max_depth=max_depth, n_estimators=n_estimators, learning_rate=learning_rate)

gkf = GroupKFold(n_splits=7)


grid_search = GridSearchCV(model, param_grid, scoring=score_auc, cv=gkf)
grid_result = grid_search.fit(X, Y, groups=patients)

grid_result.cv_results_

The result of this operation is:

{'mean_fit_time': array([ 8.92773601]),
 'mean_score_time': array([ 0.04288721]),
 'mean_test_score': array([ 0.83490629]),
 'mean_train_score': array([ 0.95167036]),
 'param_learning_rate': masked_array(data = [0.1],
              mask = [False],
        fill_value = ?),
 'param_max_depth': masked_array(data = [3],
              mask = [False],
        fill_value = ?),
 'param_n_estimators': masked_array(data = [100],
              mask = [False],
        fill_value = ?),
 'params': ({'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 100},),
 'rank_test_score': array([1]),
 'split0_test_score': array([ 0.74821666]),
 'split0_train_score': array([ 0.97564995]),
 'split1_test_score': array([ 0.80089016]),
 'split1_train_score': array([ 0.95361201]),
 'split2_test_score': array([ 0.92876979]),
 'split2_train_score': array([ 0.93935856]),
 'split3_test_score': array([ 0.95540287]),
 'split3_train_score': array([ 0.94718634]),
 'split4_test_score': array([ 0.89083901]),
 'split4_train_score': array([ 0.94787374]),
 'split5_test_score': array([ 0.90926355]),
 'split5_train_score': array([ 0.94829775]),
 'split6_test_score': array([ 0.82520379]),
 'split6_train_score': array([ 0.94971417]),
 'std_fit_time': array([ 1.79167576]),
 'std_score_time': array([ 0.02970254]),
 'std_test_score': array([ 0.0809713]),
 'std_train_score': array([ 0.0105566])}

As you can see, doing the np.mean of all the test_score it gives you a value approximately of 0.8655122606479532 while the 'mean_test_score' is 0.83490629

Thanks for you help, Leonardo.

like image 820
Dipe Avatar asked Jul 06 '17 11:07

Dipe


People also ask

What is rank test score?

rank_test_score indicates the rank of a grid search parameter combination based on the mean_test_score . If you try N parameter combinations in your grid search, rank_test_score reaches from 1 to N.

What is CV in grid search?

GridSearchCV is a technique to search through the best parameter values from the given set of the grid of parameters. It is basically a cross-validation method. the model and the parameters are required to be fed in. Best parameter values are extracted and then the predictions are made.

What is Param_grid in GridSearchCV?

param_grid – A dictionary with parameter names as keys and lists of parameter values. 3. scoring – The performance measure. For example, 'r2' for regression models, 'precision' for classification models.

What is grid search cross-validation?

Grid Search cross-validation is a technique to select the best of the machine learning model, parameterized by a grid of hyperparameters. Scikit-Learn library comes with grid search cross-validation implementation.


2 Answers

I will post this as a new answer since its so much code:

The test and train scores of the folds are: (taken from the results you posted in your question)

test_scores = [0.74821666,0.80089016,0.92876979,0.95540287,0.89083901,0.90926355,0.82520379]
train_scores = [0.97564995,0.95361201,0.93935856,0.94718634,0.94787374,0.94829775,0.94971417]

The amount of training samples in those folds are: (taken from the output of print([(len(train), len(test)) for train, test in gkf.split(X, groups=patients)]))

train_len = [41835, 56229, 56581, 58759, 60893, 60919, 62056]
test_len = [24377, 9983, 9631, 7453, 5319, 5293, 4156]

Then the test- and train-means with the amount of training samples per fold as weight is:

train_avg = np.average(train_scores, weights=train_len)
-> 0.95064898361714389
test_avg = np.average(test_scores, weights=test_len)
-> 0.83490628649308296

So this is exactly the value sklearn gives you. It is also the correct mean accuracy of your classification. The mean of the folds is incorrect in that it depends on the somewhat arbitrary splits/folds you chose.

So in concusion, both explanations were indeed identical and correct.

like image 107
Johannes Avatar answered Oct 19 '22 03:10

Johannes


If you see the original code of GridSearchCV in their github repository, they dont use np.mean() instead they use np.average() with weights. Hence the difference. Here's their code:

n_splits = 3
test_sample_counts = np.array(test_sample_counts[:n_splits],
                                    dtype=np.int)
weights = test_sample_counts if self.iid else None
means = np.average(test_scores, axis=1, weights=weights)
stds = np.sqrt(np.average((test_scores - means[:, np.newaxis]) 
                               axis=1, weights=weights))

 cv_results = dict()
 for split_i in range(n_splits):
        cv_results["split%d_test_score" % split_i] = test_scores[:,
                                                              split_i]
 cv_results["mean_test_score"] = means        
 cv_results["std_test_score"] = stds

In case you want to know more about the difference between them take a look Difference between np.mean() and np.average()

like image 29
Bharath Avatar answered Oct 19 '22 03:10

Bharath