Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What n_estimators and max_features means in RandomForestRegressor

Tags:

scikit-learn

I was reading about fine tuning the model using GridSearchCV and I came across a Parameter Grid Shown below :

param_grid = [
{'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]},

{'bootstrap': [False], 'n_estimators': [3, 10], 'max_features': [2, 3, 4]},
]
forest_reg = RandomForestRegressor(random_state=42)
# train across 5 folds, that's a total of (12+6)*5=90 rounds of training 
grid_search = GridSearchCV(forest_reg, param_grid, cv=5,
                       scoring='neg_mean_squared_error')
grid_search.fit(housing_prepared, housing_labels)

Here I am not getting the concept of n_estimator and max_feature. Is it like n_estimator means number of records from data and max_features means number of attributes to be selected from data?

After Going further I got this result :

>> grid_search.best_params_
{'max_feature':8, 'n_estimator':30}

So the thing is I am not getting what Actually this result want to say..

like image 449
Viral Parmar Avatar asked Sep 15 '17 08:09

Viral Parmar


2 Answers

After reading the documentation for RandomForest Regressor you can see that n_estimators is the number of trees to be used in the forest. Since Random Forest is an ensemble method comprising of creating multiple decision trees, this parameter is used to control the number of trees to be used in the process.

max_features on the other hand, determines the maximum number of features to consider while looking for a split. For more information on max_features read this answer.

like image 100
Gambit1614 Avatar answered Sep 28 '22 01:09

Gambit1614


n_estimators: This is the number of trees (in general the number of samples on which this algorithm will work then it will aggregate them to give you the final answer) you want to build before taking the maximum voting or averages of predictions. The higher number of trees give you better performance but makes your code slower.

max_features: The number of features to consider when looking for the best split.

>> grid_search.best_params_ :- {'max_feature':8, 'n_estimator':30}

This means they are best hyperparameter you should run model among n_estimators{3,10,30} or max_features {2, 4, 6, 8}

like image 36
Gaurav Bansal Avatar answered Sep 28 '22 03:09

Gaurav Bansal