I was checking sklearn documentation webpage about GridSearchCV
.
One of attributes of GridSearchCV
object is best_estimator_
.
So here is my question. How to pass more than one estimator to GSCV object?
Using a dictionary like:
{'SVC()':{'C':10, 'gamma':0.01}, ' DecTreeClass()':{....}}
?
Pipeline can be used to chain multiple estimators into one. This is useful as there is often a fixed sequence of steps in processing the data, for example feature selection, normalization and classification.
GridSearchCV tries all the combinations of the values passed in the dictionary and evaluates the model for each combination using the Cross-Validation method. Hence after using this function we get accuracy/loss for every combination of hyperparameters and we can choose the one with the best performance.
param_grid – A dictionary with parameter names as keys and lists of parameter values. 3. scoring – The performance measure. For example, 'r2' for regression models, 'precision' for classification models.
The only difference between both the approaches is in grid search we define the combinations and do training of the model whereas in RandomizedSearchCV the model selects the combinations randomly. Both are very effective ways of tuning the parameters that increase the model generalizability.
GridSearchCV works on parameters. It will train multiple estimators (but same class (one of SVC, or DecisionTreeClassifier, or other classifiers) with different parameter combinations from specified in param_grid
. best_estimator_
is the estimator which performs best on the data.
So essentially best_estimator_
is the same class object initialized with best found params.
So in the basic setup you cannot use multiple estimators in the grid-search.
But as a workaround, you can have multiple estimators when using a pipeline in which the estimator is a "parameter"
which the GridSearchCV can set.
Something like this:
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_iris
iris_data = load_iris()
X, y = iris_data.data, iris_data.target
# Just initialize the pipeline with any estimator you like
pipe = Pipeline(steps=[('estimator', SVC())])
# Add a dict of estimator and estimator related parameters in this list
params_grid = [{
'estimator':[SVC()],
'estimator__C': [1, 10, 100, 1000],
'estimator__gamma': [0.001, 0.0001],
},
{
'estimator': [DecisionTreeClassifier()],
'estimator__max_depth': [1,2,3,4,5],
'estimator__max_features': [None, "auto", "sqrt", "log2"],
},
# {'estimator':[Any_other_estimator_you_want],
# 'estimator__valid_param_of_your_estimator':[valid_values]
]
grid = GridSearchCV(pipe, params_grid)
You can add as many dicts inside the list of params_grid
as you like, but make sure that each dict have compatible parameters related to the 'estimator'
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With