While constructing each tree in the random forest using bootstrapped samples, for each terminal node, we select m variables at random from p variables to find the best split (p is the total number of features in your data). My questions (for RandomForestRegressor) are:
1) What does max_features correspond to (m or p or something else)?
2) Are m variables selected at random from max_features variables (what is the value of m)?
3) If max_features corresponds to m, then why would I want to set it equal to p for regression (the default)? Where is the randomness with this setting (i.e., how is it different from bagging)?
Thanks.
max_features: These are the maximum number of features Random Forest is allowed to try in individual tree. There are multiple options available in Python to assign maximum features.
max_features: The number of features to consider when looking for the best split. If this value is not set, the decision tree will consider all features available to make the best split.
(The parameters of a random forest are the variables and thresholds used to split each node learned during training). Scikit-Learn implements a set of sensible default hyperparameters for all models, but these are not guaranteed to be optimal for a problem.
Generally, we go with a max depth of 3, 5, or 7.
Straight from the documentation:
[
max_features
] is the size of the random subsets of features to consider when splitting a node.
So max_features
is what you call m. When max_features="auto"
, m = p and no feature subset selection is performed in the trees, so the "random forest" is actually a bagged ensemble of ordinary regression trees. The docs go on to say that
Empirical good default values are
max_features=n_features
for regression problems, andmax_features=sqrt(n_features)
for classification tasks
By setting max_features
differently, you'll get a "true" random forest.
@lynnyi, max_features is the number of features that are considered on a per-split level, rather than on the entire decision tree construction. More clear, during the construction of each decision tree, RF will still use all the features (n_features), but it only consider number of "max_features" features for node splitting. And the "max_features" features are randomly selected from the entire features. You could confirm this by plotting one decision tree from a RF with max_features=1, and check all the nodes of that tree to count the number of features involved.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With