I am attempting to tune an AdaBoost Classifier ("ABT") using a DecisionTreeClassifier ("DTC") as the base_estimator. I would like to tune both ABT and DTC parameters simultaneously, but am not sure how to accomplish this - pipeline shouldn't work, as I am not "piping" the output of DTC to ABT. The idea would be to iterate hyper parameters for ABT and DTC in the GridSearchCV estimator.
How can I specify the tuning parameters correctly?
I tried the following, which generated an error below.
[IN] from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import AdaBoostClassifier from sklearn.grid_search import GridSearchCV param_grid = {dtc__criterion : ["gini", "entropy"], dtc__splitter : ["best", "random"], abc__n_estimators: [none, 1, 2] } DTC = DecisionTreeClassifier(random_state = 11, max_features = "auto", class_weight = "auto",max_depth = None) ABC = AdaBoostClassifier(base_estimator = DTC) # run grid search grid_search_ABC = GridSearchCV(ABC, param_grid=param_grid, scoring = 'roc_auc') [OUT] ValueError: Invalid parameter dtc for estimator AdaBoostClassifier(algorithm='SAMME.R', base_estimator=DecisionTreeClassifier(class_weight='auto', criterion='gini', max_depth=None, max_features='auto', max_leaf_nodes=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, random_state=11, splitter='best'), learning_rate=1.0, n_estimators=50, random_state=11)
There are several things wrong in the code you posted:
param_grid
dictionary need to be strings. You should be getting a NameError
.AdaBoostClassifier
.None
(and not none
) is not a valid value for n_estimators
. The default value (probably what you meant) is 50.Here's the code with these fixes. To set the parameters of your Tree estimator you can use the "__" syntax that allows accessing nested parameters.
from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import AdaBoostClassifier from sklearn.grid_search import GridSearchCV param_grid = {"base_estimator__criterion" : ["gini", "entropy"], "base_estimator__splitter" : ["best", "random"], "n_estimators": [1, 2] } DTC = DecisionTreeClassifier(random_state = 11, max_features = "auto", class_weight = "auto",max_depth = None) ABC = AdaBoostClassifier(base_estimator = DTC) # run grid search grid_search_ABC = GridSearchCV(ABC, param_grid=param_grid, scoring = 'roc_auc')
Also, 1 or 2 estimators does not really make sense for AdaBoost. But I'm guessing this is not the actual code you're running.
Hope this helps.
Trying to provide a shorter (and hopefully generic) answer.
If you want to grid search within a BaseEstimator
for the AdaBoostClassifier
e.g. varying the max_depth
or min_sample_leaf
of a DecisionTreeClassifier
estimator, then you have to use a special syntax in the parameter grid.
abc = AdaBoostClassifier(base_estimator=DecisionTreeClassifier()) parameters = {'base_estimator__max_depth':[i for i in range(2,11,2)], 'base_estimator__min_samples_leaf':[5,10], 'n_estimators':[10,50,250,1000], 'learning_rate':[0.01,0.1]} clf = GridSearchCV(abc, parameters,verbose=3,scoring='f1',n_jobs=-1) clf.fit(X_train,y_train)
So, note the 'base_estimator__max_depth'
and 'base_estimator__min_samples_leaf'
keys in the parameters
dictionary. That's the way to access the hyperparameters of a BaseEstimator for an ensemble algorithm like AdaBoostClassifier
when you are doing a grid search. Note the __
double underscore notation in particular. Other two keys in the parameters
are the regular AdaBoostClassifier
parameters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With