Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the default value for max_features in RandomForestClassifier different than the one in RandomForestRegressor?

Tags:

scikit-learn

In RandomForestClassifier the default value for max_features is sqrt(n_features) and in RandomForestRegressor it is n_features, any specific reason for that?

like image 894
d1337 Avatar asked Aug 29 '13 05:08

d1337


1 Answers

This is an heuristic based on empirical results. On average, it seems to be a better choice, as a default setting, to set max_features=sqrt(n_features) for classification and max_features=n_features for regression.

This heuristic stems from this paper : http://orbi.ulg.ac.be/bitstream/2268/9357/1/geurts-mlj-advance.pdf

In any case, it is of course always a better idea to cross-validate this parameter.

like image 166
Gilles Louppe Avatar answered Sep 19 '22 17:09

Gilles Louppe