I have a dataset with 158 rows and 10 columns. I try to build multiple linear regression model and try to predict future value.
I used GridSearchCV for tunning parameters.
Here is my GridSearchCV and Regression function :
def GridSearch(data):
X_train, X_test, y_train, y_test = cross_validation.train_test_split(data, ground_truth_data, test_size=0.3, random_state = 0)
parameters = {'fit_intercept':[True,False], 'normalize':[True,False], 'copy_X':[True, False]}
model = linear_model.LinearRegression()
grid = GridSearchCV(model,parameters)
grid.fit(X_train, y_train)
predictions = grid.predict(X_test)
print "Grid best score: ", grid.best_score_
print "Grid score function: ", grid.score(X_test,y_test)
Output of this code is :
Grid best score: 0.720298870251
Grid score function: 0.888263112299
My question is what is the difference between best_score_
and score
function ?
How the score
function can be better than the best_score
function ?
Thanks in advance.
That being said, best_score_ from GridSearchCV is the mean cross-validated score of the best_estimator. For example, in the case of using 5-fold cross-validation, GridSearchCV divides the data into 5 folds and trains the model 5 times.
The only difference between both the approaches is in grid search we define the combinations and do training of the model whereas in RandomizedSearchCV the model selects the combinations randomly. Both are very effective ways of tuning the parameters that increase the model generalizability.
After fitting the models (note that we call fit on the GridSearchCV instead of the estimator itself) we can get the results using the sklearn. grid_search. GridSearchCV. cv_results_ attribute.
The best_score_
is the best score from the cross-validation. That is, the model is fit on part of the training data, and the score is computed by predicting the rest of the training data. This is because you passed X_train
and y_train
to fit
; the fit
process thus does not know anything about your test set, only your training set.
The score
method of the model object scores the model on the data you give it. You passed X_test
and y_test
, so this call computes the score of the fit (i.e., tuned) model on the test set.
In short, the two scores are calculated on different data sets, so it shouldn't be surprising that they are different.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With