Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using GridSearchCV with AdaBoost and DecisionTreeClassifier

I am attempting to tune an AdaBoost Classifier ("ABT") using a DecisionTreeClassifier ("DTC") as the base_estimator. I would like to tune both ABT and DTC parameters simultaneously, but am not sure how to accomplish this - pipeline shouldn't work, as I am not "piping" the output of DTC to ABT. The idea would be to iterate hyper parameters for ABT and DTC in the GridSearchCV estimator.

How can I specify the tuning parameters correctly?

I tried the following, which generated an error below.

[IN] from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import AdaBoostClassifier from sklearn.grid_search import GridSearchCV  param_grid = {dtc__criterion : ["gini", "entropy"],               dtc__splitter :   ["best", "random"],               abc__n_estimators: [none, 1, 2]              }   DTC = DecisionTreeClassifier(random_state = 11, max_features = "auto", class_weight = "auto",max_depth = None)  ABC = AdaBoostClassifier(base_estimator = DTC)  # run grid search grid_search_ABC = GridSearchCV(ABC, param_grid=param_grid, scoring = 'roc_auc')  [OUT] ValueError: Invalid parameter dtc for estimator AdaBoostClassifier(algorithm='SAMME.R',       base_estimator=DecisionTreeClassifier(class_weight='auto', criterion='gini', max_depth=None,         max_features='auto', max_leaf_nodes=None, min_samples_leaf=1,         min_samples_split=2, min_weight_fraction_leaf=0.0,         random_state=11, splitter='best'),       learning_rate=1.0, n_estimators=50, random_state=11) 
like image 317
GPB Avatar asked Aug 25 '15 17:08

GPB


2 Answers

There are several things wrong in the code you posted:

  1. The keys of the param_grid dictionary need to be strings. You should be getting a NameError.
  2. The key "abc__n_estimators" should just be "n_estimators": you are probably mixing this with the pipeline syntax. Here nothing tells Python that the string "abc" represents your AdaBoostClassifier.
  3. None (and not none) is not a valid value for n_estimators. The default value (probably what you meant) is 50.

Here's the code with these fixes. To set the parameters of your Tree estimator you can use the "__" syntax that allows accessing nested parameters.

from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import AdaBoostClassifier from sklearn.grid_search import GridSearchCV  param_grid = {"base_estimator__criterion" : ["gini", "entropy"],               "base_estimator__splitter" :   ["best", "random"],               "n_estimators": [1, 2]              }   DTC = DecisionTreeClassifier(random_state = 11, max_features = "auto", class_weight = "auto",max_depth = None)  ABC = AdaBoostClassifier(base_estimator = DTC)  # run grid search grid_search_ABC = GridSearchCV(ABC, param_grid=param_grid, scoring = 'roc_auc') 

Also, 1 or 2 estimators does not really make sense for AdaBoost. But I'm guessing this is not the actual code you're running.

Hope this helps.

like image 53
ldirer Avatar answered Sep 20 '22 13:09

ldirer


Trying to provide a shorter (and hopefully generic) answer.


If you want to grid search within a BaseEstimator for the AdaBoostClassifier e.g. varying the max_depth or min_sample_leaf of a DecisionTreeClassifier estimator, then you have to use a special syntax in the parameter grid.

abc = AdaBoostClassifier(base_estimator=DecisionTreeClassifier())  parameters = {'base_estimator__max_depth':[i for i in range(2,11,2)],               'base_estimator__min_samples_leaf':[5,10],               'n_estimators':[10,50,250,1000],               'learning_rate':[0.01,0.1]}  clf = GridSearchCV(abc, parameters,verbose=3,scoring='f1',n_jobs=-1) clf.fit(X_train,y_train) 

So, note the 'base_estimator__max_depth' and 'base_estimator__min_samples_leaf' keys in the parameters dictionary. That's the way to access the hyperparameters of a BaseEstimator for an ensemble algorithm like AdaBoostClassifier when you are doing a grid search. Note the __ double underscore notation in particular. Other two keys in the parameters are the regular AdaBoostClassifier parameters.

like image 40
Tirtha Avatar answered Sep 19 '22 13:09

Tirtha