Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

More than one estimator in GridSearchCV(sklearn)

I was checking sklearn documentation webpage about GridSearchCV. One of attributes of GridSearchCV object is best_estimator_. So here is my question. How to pass more than one estimator to GSCV object?

Using a dictionary like: {'SVC()':{'C':10, 'gamma':0.01}, ' DecTreeClass()':{....}}?

like image 532
mikinoqwert Avatar asked Aug 01 '18 08:08

mikinoqwert


People also ask

Can pipeline have multiple estimators?

Pipeline can be used to chain multiple estimators into one. This is useful as there is often a fixed sequence of steps in processing the data, for example feature selection, normalization and classification.

How does Sklearn GridSearchCV work?

GridSearchCV tries all the combinations of the values passed in the dictionary and evaluates the model for each combination using the Cross-Validation method. Hence after using this function we get accuracy/loss for every combination of hyperparameters and we can choose the one with the best performance.

What is Param_grid in GridSearchCV?

param_grid – A dictionary with parameter names as keys and lists of parameter values. 3. scoring – The performance measure. For example, 'r2' for regression models, 'precision' for classification models.

What is the difference between GridSearchCV and RandomizedSearchCV?

The only difference between both the approaches is in grid search we define the combinations and do training of the model whereas in RandomizedSearchCV the model selects the combinations randomly. Both are very effective ways of tuning the parameters that increase the model generalizability.


1 Answers

GridSearchCV works on parameters. It will train multiple estimators (but same class (one of SVC, or DecisionTreeClassifier, or other classifiers) with different parameter combinations from specified in param_grid. best_estimator_ is the estimator which performs best on the data.

So essentially best_estimator_ is the same class object initialized with best found params.

So in the basic setup you cannot use multiple estimators in the grid-search.

But as a workaround, you can have multiple estimators when using a pipeline in which the estimator is a "parameter" which the GridSearchCV can set.

Something like this:

from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_iris
iris_data = load_iris()
X, y = iris_data.data, iris_data.target


# Just initialize the pipeline with any estimator you like    
pipe = Pipeline(steps=[('estimator', SVC())])

# Add a dict of estimator and estimator related parameters in this list
params_grid = [{
                'estimator':[SVC()],
                'estimator__C': [1, 10, 100, 1000],
                'estimator__gamma': [0.001, 0.0001],
                },
                {
                'estimator': [DecisionTreeClassifier()],
                'estimator__max_depth': [1,2,3,4,5],
                'estimator__max_features': [None, "auto", "sqrt", "log2"],
                },
               # {'estimator':[Any_other_estimator_you_want],
               #  'estimator__valid_param_of_your_estimator':[valid_values]

              ]

grid = GridSearchCV(pipe, params_grid)

You can add as many dicts inside the list of params_grid as you like, but make sure that each dict have compatible parameters related to the 'estimator'.

like image 97
Vivek Kumar Avatar answered Sep 23 '22 12:09

Vivek Kumar