Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sklearn and GridSearchCV - Is it expected to return optimal parameters?

I have been working to optimize a SVR model in Scikit-Learn, but have been unable to understand how to leverage GridSearchCV.

Consider a slightly modified case of the example code provided in the documentation:

from sklearn import svm, grid_search, datasets
iris = datasets.load_iris()
parameters = {'kernel': ('linear', 'rbf'), 'C':[1.5, 10]}
svr = svm.SVC()
clf = grid_search.GridSearchCV(svr, parameters)
clf.fit(iris.data, iris.target)

clf.get_params()

Since I specify that the search of optimal C values comprises just 1.5 and 10, I would expect the model return to use one of those two values. However, when I look at the output, that does not appear to be the case:

{'cv': None,
 'estimator': SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
   kernel='rbf', max_iter=-1, probability=False, random_state=None,
   shrinking=True, tol=0.001, verbose=False),
 'estimator__C': 1.0,
 'estimator__cache_size': 200,
 'estimator__class_weight': None,
 'estimator__coef0': 0.0,
 'estimator__degree': 3,
 'estimator__gamma': 0.0,
 'estimator__kernel': 'rbf',
 'estimator__max_iter': -1,
 'estimator__probability': False,
 'estimator__random_state': None,
 'estimator__shrinking': True,
 'estimator__tol': 0.001,
 'estimator__verbose': False,
 'fit_params': {},
 'iid': True,
 'loss_func': None,
 'n_jobs': 1,
 'param_grid': {'C': [1.5, 10], 'kernel': ('linear', 'rbf')},
 'pre_dispatch': '2*n_jobs',
 'refit': True,
 'score_func': None,
 'scoring': None,
 'verbose': 0}

I suspect I have a fundamental misunderstanding of GridSearchCV how to use it, and what I can expect it to return. I had expected it to return a classifier with optimized parameters based on my search choices, but this does not appear to be the case.

Any guidance would be greatly appreciated.

Thank you very much.

like image 634
amormachine Avatar asked Sep 20 '14 13:09

amormachine


People also ask

How does Sklearn GridSearchCV work?

GridSearchCV tries all the combinations of the values passed in the dictionary and evaluates the model for each combination using the Cross-Validation method. Hence after using this function we get accuracy/loss for every combination of hyperparameters and we can choose the one with the best performance.

Why do we use GridSearchCV?

GridSearchCV is a technique to search through the best parameter values from the given set of the grid of parameters. It is basically a cross-validation method. the model and the parameters are required to be fed in. Best parameter values are extracted and then the predictions are made.

Which of the following approach to parameter search are provided in scikit-learn?

Two generic approaches to parameter search are provided in scikit-learn: for given values, GridSearchCV exhaustively considers all parameter combinations, while RandomizedSearchCV can sample a given number of candidates from a parameter space with a specified distribution.


1 Answers

You should not use get_params here. use best_params_ or best_estimator_.params. get_params gives you back the constructor parameters that you gave it. One of them is estimator, where you gave it an SVC with default parameters, which is what you see here. That has nothing to do with the parameters that are tried in the grid search.

If you look at the examples (look at the bottom of the dev documentation for example) you will never see get_params used on GridSearchCV - or actually ever, I think ;) It is the interface that defines how GridSearchCV can use other estimators.

like image 88
Andreas Mueller Avatar answered Oct 24 '22 13:10

Andreas Mueller