In RandomForestClassifier
the default value for max_features
is sqrt(n_features)
and in RandomForestRegressor
it is n_features
, any specific reason for that?
This is an heuristic based on empirical results. On average, it seems to be a better choice, as a default setting, to set max_features=sqrt(n_features) for classification and max_features=n_features for regression.
This heuristic stems from this paper : http://orbi.ulg.ac.be/bitstream/2268/9357/1/geurts-mlj-advance.pdf
In any case, it is of course always a better idea to cross-validate this parameter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With