Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does not GridSearchCV give best score ? - Scikit Learn

I have a dataset with 158 rows and 10 columns. I try to build multiple linear regression model and try to predict future value.

I used GridSearchCV for tunning parameters.

Here is my GridSearchCV and Regression function :

def GridSearch(data):
    X_train, X_test, y_train, y_test = cross_validation.train_test_split(data, ground_truth_data, test_size=0.3, random_state = 0)
    
    parameters = {'fit_intercept':[True,False], 'normalize':[True,False], 'copy_X':[True, False]}
    
    model = linear_model.LinearRegression()
    
    grid = GridSearchCV(model,parameters)
    
    grid.fit(X_train, y_train)
    predictions = grid.predict(X_test)
    
    print "Grid best score: ", grid.best_score_
    print "Grid score function: ", grid.score(X_test,y_test)

Output of this code is :

Grid best score: 0.720298870251

Grid score function: 0.888263112299

My question is what is the difference between best_score_ and score function ?

How the score function can be better than the best_score function ?

Thanks in advance.

like image 651
Batuhan B Avatar asked May 25 '15 16:05

Batuhan B


People also ask

What is GridSearchCV best score?

That being said, best_score_ from GridSearchCV is the mean cross-validated score of the best_estimator. For example, in the case of using 5-fold cross-validation, GridSearchCV divides the data into 5 folds and trains the model 5 times.

Which is better GridSearchCV or RandomizedSearchCV?

The only difference between both the approaches is in grid search we define the combinations and do training of the model whereas in RandomizedSearchCV the model selects the combinations randomly. Both are very effective ways of tuning the parameters that increase the model generalizability.

How do I see GridSearchCV results?

After fitting the models (note that we call fit on the GridSearchCV instead of the estimator itself) we can get the results using the sklearn. grid_search. GridSearchCV. cv_results_ attribute.


1 Answers

The best_score_ is the best score from the cross-validation. That is, the model is fit on part of the training data, and the score is computed by predicting the rest of the training data. This is because you passed X_train and y_train to fit; the fit process thus does not know anything about your test set, only your training set.

The score method of the model object scores the model on the data you give it. You passed X_test and y_test, so this call computes the score of the fit (i.e., tuned) model on the test set.

In short, the two scores are calculated on different data sets, so it shouldn't be surprising that they are different.

like image 88
BrenBarn Avatar answered Oct 20 '22 00:10

BrenBarn