How is scikit-learn GridSearchCV best_score_ calculated?

Tags:

I've been trying to figure out how is the best_score_ parameter of GridSearchCV is being calculated (or in other words, what does it mean). The documentation says:

Score of best_estimator on the left out data.

So, I tried to translate it into something I understand and calculated the r2_score of the actual "y"s and the predicted ys of each kfold - and got different results (used this piece of code):

test_pred = np.zeros(y.shape) * np.nan 
for train_ind, test_ind in kfold:
    clf.best_estimator_.fit(X[train_ind, :], y[train_ind])
    test_pred[test_ind] = clf.best_estimator_.predict(X[test_ind])
r2_test = r2_score(y, test_pred)

I've searched everywhere for a more meaningful explanation of the best_score_ and couldn't find anything. Would anyone care to explain?

Thanks

692

asked Jun 07 '14 10:06

Korem

1 Answers

It's the mean cross-validation score of the best estimator. Let's make some data and fix the cross-validation's division of data.

>>> y = linspace(-5, 5, 200)
>>> X = (y + np.random.randn(200)).reshape(-1, 1)
>>> threefold = list(KFold(len(y)))

Now run cross_val_score and GridSearchCV, both with these fixed folds.

>>> cross_val_score(LinearRegression(), X, y, cv=threefold)
array([-0.86060164,  0.2035956 , -0.81309259])
>>> gs = GridSearchCV(LinearRegression(), {}, cv=threefold, verbose=3).fit(X, y) 
Fitting 3 folds for each of 1 candidates, totalling 3 fits
[CV]  ................................................................
[CV] ...................................... , score=-0.860602 -   0.0s
[Parallel(n_jobs=1)]: Done   1 jobs       | elapsed:    0.0s
[CV]  ................................................................
[CV] ....................................... , score=0.203596 -   0.0s
[CV]  ................................................................
[CV] ...................................... , score=-0.813093 -   0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.0s finished

Note the score=-0.860602, score=0.203596 and score=-0.813093 in the GridSearchCV output; exactly the values returned by cross_val_score.

Note that the "mean" is really a macro-average over the folds. The iid parameter to GridSearchCV can be used to get a micro-average over the samples instead.

answered Nov 15 '22 08:11

Fred Foo

Related questions
                            
                                Python svgwrite module background color
                            
                                PyQt: How to customize QComboBox item appearance
                            
                                Filter Pandas DataFrame by ip address range
                            
                                How to simply have python print "\xaa\xbb\xcc\xdd" as 0xddccbbaa?
                            
                                Why does Python's `requests` reject my SSL certificate, which browsers accept
                            
                                DatetimeField filter after a given date
                            
                                OpenCV - Python: How Do I split an Image in a grid?
                            
                                Calling locals() in a function not intuitive?
                            
                                How to know if the CPython executable is the debug version, in python?
                            
                                Label objects not found
                            
                                python regex: get name of named group
                            
                                Scrapy: FormRequest doesn't auto-populate ASP.net hidden fields
                            
                                In Fabric, how can I execute tasks from another python file?
                            
                                Check if a string contains date or timestamp in python [closed]
                            
                                Navigating Python modules with ctags in Vim?
                            
                                how to set bounds for the x-axis in one figure containing multiple matplotlib histograms and create just one column of graphs?
                            
                                How to get the Jinja2 generated input value data?
                            
                                Set a whole column in `QTableWidget` read-only in python
                            
                                numpy calculate polynom efficiently
                            
                                Pandas - convert dataframe multi-index to datetime object

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How is scikit-learn GridSearchCV best_score_ calculated?

Tags:

python

machine-learning

python-2.7

scikit-learn

Korem

People also ask

1 Answers

Fred Foo

Recent Activity

Donate For Us