I feel uncomfortable with the meaning of the stepFactor
parameter of the tuneRF
function which is used for tuning the mtry
parameter used further in the randomForest
function.
The documentation of tuneRF
says that stepFactor
is a magnitude by which
the chosen mtry
gets deflated or inflated.
Obviously, since mtry
is a number of variables chosen randomly, it has to be an integer, however I saw many examples on the net using stepFactor=1.5
.
At first I thought that R uses by default next mtry
equal to floor(mtry_current-stepFactor)
, but it turned out that I was wrong.
Moreover, I do not understand the R commands displaying search left... search right...
while tuneRF
is working.
I thought it was the information on either inflating or deflating the mtry
parameter but my suppositions did not turn out to be correct.
To sum up this long and not too graceful description of my doubts, my questions are:
why is stepFactor
NOT an integer number??
How are subsequent mtry
values chosen?
What searching left/right actually mean??
Any help would be very much appreciated!! :)
The number of variables selected at each split is denoted by mtry in randomforest function. Select mtry value with minimum out of bag(OOB) error. In this case, mtry = 4 is the best mtry as it has least OOB error. mtry = 4 was also used as default mtry.
mtry : the number of variables to randomly sample as candidates at each split.
One nice trick that I use is to initially start with first taking square root of the number of predictors and plug that value for "mtry". It is usually around the same value that tunerf funtion in random forest would pick. this is a rule thumb for classification only!
mtry: Number of variables randomly sampled as candidates at each split. ntree: Number of trees to grow.
Below is a summary of how tuneRF
works:
a. Set mtry to the default value of sqrt(p) for classification, and p/3 for regression (where p = total number of variables)
b. Compute the out-of-bag (OOB) error (say error_default) for a Random Forest with mtry set to the default value found above
a. Look to the left: set mtry = default value/stepFactor. For instance, if stepFactor=1.5 and your default starting value is 8, mtry would be set to be 8/1.5=5.33, rounded up to the be an integer, which gives 6
b. Compute the OOB error, say error_left
a. Look to the right: set mtry = default value*stepFactor. To continue with my example, mtry would be set to be 8*1.5=12
b. Compute the OOB error, say error_right
i. If (error_default < error_right) OR (error_default < error_left), the best mtry is the default value
ii. If the previous condition is not met, but the delta between errors_default and error_right/error_left is less than the improve parameter, the best mtry is the default value
iii. Without any loss of generality, if the condition is not met, and if error_right < error_left, and if (error_default-error_right) > improve, set mtry to be mtry_right (12). From now on, always go to the right
If 4.iii. is verified, iterate: set mtry to be mtry_right*stepFactor (in my example, 12*1.5=18), compute the OOB error and compare it with the error obtained at the previous step (in my example, for mtry=12). If the error new error is smaller, and if the gain in error reduction is enough (i.e, >improve), select the new mtry and continue to repeat these steps, otherwise stop and return the current mtry as the best mtry
The smaller stepFactor you set (e.g., 1.1, 1.2), the more values of mtry you try (fine search), the bigger stepFactor you set (e.g., 2, 2.5), the less values you try (rough search). Also, with low values of improve, the search will continue longer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With