I want to use StackingClassifier to combine some classifiers and then use GridSearchCV to optimize the parameters:
clf1 = RandomForestClassifier()
clf2 = LogisticRegression()
dt = DecisionTreeClassifier()
sclf = StackingClassifier(estimators=[clf1, clf2],final_estimator=dt)
params = {'randomforestclassifier__n_estimators': [10, 50],
'logisticregression__C': [1,2,3]}
grid = GridSearchCV(estimator=sclf, param_grid=params, cv=5)
grid.fit(x, y)
But this turns out an error:
'RandomForestClassifier' object has no attribute 'estimators_'
I have used n_estimators
. Why it warns me that no estimators_
?
Usually GridSearchCV is applied to single model so I just need to write the name of parameters of the single model in a dict.
I refer to this page https://groups.google.com/d/topic/mlxtend/5GhZNwgmtSg but it uses parameters of early version. Even though I change the newly parameters it doesn't work.
Btw, where can I learn the details of the naming rule of these params?
The StackingClassifier also enables grid search over the classifiers argument. When there are level-mixed hyperparameters, GridSearchCV will try to replace hyperparameters in a top-down order, i.e., classifers -> single base classifier -> classifier hyperparameter. For instance, given a hyperparameter grid such as
Just like other ensemble techniques stacking classifier too uses many models for the task of prediction. A stacking classifier is an ensemble method where the output from multiple classifiers is passed as an input to a meta-classifier for the task of the final classification.
When there are level-mixed hyperparameters, GridSearchCV will try to replace hyperparameters in a top-down order, i.e., classifers -> single base classifier -> classifier hyperparameter. For instance, given a hyperparameter grid such as it will first use the instance settings of either (clf1, clf1, clf1) or (clf2, clf3).
I have often read that GridSearchCV can be used in combination with early stopping, but I can not find a sample code in which this is demonstrated. With EarlyStopping I would try to find the optimal number of epochs, but I don't know how I can combine EarlyStopping with GridSearchCV or at least with cross validation.
First of all, the estimators
need to be a list containing the models in tuples with the corresponding assigned names.
estimators = [('model1', model()), # model() named model1 by myself
('model2', model2())] # model2() named model2 by myself
Next, you need to use the names as they appear in sclf.get_params()
.
Also, the name is the same as the one you gave to the specific model in the bove estimators
list. So, here for model1 parameters you need:
params = {'model1__n_estimators': [5,10]} # model1__SOME_PARAM
Working toy example:
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.model_selection import GridSearchCV
X, y = make_classification(n_samples=1000, n_features=4,
n_informative=2, n_redundant=0,
random_state=0, shuffle=False)
estimators = [('rf', RandomForestClassifier(n_estimators=10, random_state=42)),
('logreg', LogisticRegression())]
sclf = StackingClassifier(estimators= estimators , final_estimator=DecisionTreeClassifier())
params = {'rf__n_estimators': [5,10]}
grid = GridSearchCV(estimator=sclf, param_grid=params, cv=5)
grid.fit(X, y)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With