Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tuning parameters of the classifier used by BaggingClassifier

Say that I want to train BaggingClassifier that uses DecisionTreeClassifier:

dt = DecisionTreeClassifier(max_depth = 1)
bc = BaggingClassifier(dt, n_estimators = 500, max_samples = 0.5, max_features = 0.5)
bc = bc.fit(X_train, y_train)

I would like to use GridSearchCV to find the best parameters for both BaggingClassifier and DecisionTreeClassifier (e.g. max_depth from DecisionTreeClassifier and max_samples from BaggingClassifier), what is the syntax for this?

like image 651
Tim Avatar asked Nov 30 '17 09:11

Tim


People also ask

What is hyperparameter tuning in classification?

Last Updated on August 28, 2020. Machine learning algorithms have hyperparameters that allow you to tailor the behavior of the algorithm to your specific dataset. Hyperparameters are different from parameters, which are the internal coefficients or weights for a model found by the learning algorithm.

What are the hyperparameters of logistic regression?

The main hyperparameters we may tune in logistic regression are: solver, penalty, and regularization strength (sklearn documentation). Solver is the algorithm to use in the optimization problem.

What is BaggingClassifier?

A Bagging classifier. A Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction.

What are hyperparameters in bagging?

An important hyperparameter for the Bagging algorithm is the number of decision trees used in the ensemble. Typically, the number of trees is increased until the model performance stabilizes. Intuition might suggest that more trees will lead to overfitting, although this is not the case.


1 Answers

I found the solution myself:

param_grid = {
    'base_estimator__max_depth' : [1, 2, 3, 4, 5],
    'max_samples' : [0.05, 0.1, 0.2, 0.5]
}

clf = GridSearchCV(BaggingClassifier(DecisionTreeClassifier(),
                                     n_estimators = 100, max_features = 0.5),
                   param_grid, scoring = choosen_scoring)
clf.fit(X_train, y_train)

i.e. saying that max_depth "belongs to" __ the base_estimator, i.e. my DecisionTreeClassifier in this case. This works and returns the correct results.

like image 171
Tim Avatar answered Sep 16 '22 16:09

Tim