Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The explanation of the verbose mode during running randomForest in R

I am running randomForest in R with the verbose mode(do.trace), and I was wondering what the meanings of columns in the message are. I can see ntree is number of trees, and OOB is the % of out of bag samples, but what are "1" and "2" ?

> rf.m <- randomForest(x = X.train, y=as.factor(y.train), do.trace=10)
ntree      OOB      1      2
   10:  32.03% 15.60% 82.47%
   20:  29.18% 10.51% 86.31%
   30:  27.44%  7.47% 88.57%
   40:  26.48%  5.29% 91.33%
   50:  25.92%  4.35% 91.96%
   ....
like image 520
Alby Avatar asked Jan 24 '15 16:01

Alby


1 Answers

Columns 1 and 2 in the output give the classification error for each class. The OOB value is the weighted average of the class errors (weighted by the fraction of observations in each class).

An example (adapting the random forest example from the help page):

# Keep every 100th tree in the trace
set.seed(71)
iris.rf <- randomForest(Species ~ ., data=iris, importance=TRUE,
                        proximity=TRUE, do.trace=100)

ntree      OOB      1      2      3
  100:   6.00%  0.00%  8.00% 10.00%
  200:   5.33%  0.00%  6.00% 10.00%
  300:   6.00%  0.00%  8.00% 10.00%
  400:   4.67%  0.00%  8.00%  6.00%
  500:   5.33%  0.00%  8.00%  8.00%

The weighted average of the class errors for the 100th tree gives an OOB error rate of 6.0%, exactly as reported in the trace above. (prop.table returns the fraction of observations in each category (each class) of species). We multiply that element-wise by the class errors for the 100th tree, as given in the trace values above, and then sum to get the weighted average error over all classes (the OOB error).

sum(prop.table(table(iris$Species)) * c(0, 0.08, 0.10))
[,1]
[1,] 0.06

You can avoid needing to use sum if you use matrix multiplication, which here is equivalent to the dot/scalar/inner product:

prop.table(table(iris$Species)) %*% c(0, 0.08, 0.10)
like image 83
eipi10 Avatar answered Nov 14 '22 22:11

eipi10