I am running randomForest in R with the verbose mode(do.trace),
and I was wondering what the meanings of columns in the message are.
I can see ntree
is number of trees, and OOB
is the % of out of bag samples, but what are "1" and "2" ?
> rf.m <- randomForest(x = X.train, y=as.factor(y.train), do.trace=10)
ntree OOB 1 2
10: 32.03% 15.60% 82.47%
20: 29.18% 10.51% 86.31%
30: 27.44% 7.47% 88.57%
40: 26.48% 5.29% 91.33%
50: 25.92% 4.35% 91.96%
....
Columns 1
and 2
in the output give the classification error for each class. The OOB
value is the weighted average of the class errors (weighted by the fraction of observations in each class).
An example (adapting the random forest example from the help page):
# Keep every 100th tree in the trace
set.seed(71)
iris.rf <- randomForest(Species ~ ., data=iris, importance=TRUE,
proximity=TRUE, do.trace=100)
ntree OOB 1 2 3
100: 6.00% 0.00% 8.00% 10.00%
200: 5.33% 0.00% 6.00% 10.00%
300: 6.00% 0.00% 8.00% 10.00%
400: 4.67% 0.00% 8.00% 6.00%
500: 5.33% 0.00% 8.00% 8.00%
The weighted average of the class errors for the 100th tree gives an OOB error rate of 6.0%, exactly as reported in the trace above. (prop.table
returns the fraction of observations in each category (each class) of species). We multiply that element-wise by the class errors for the 100th tree, as given in the trace values above, and then sum to get the weighted average error over all classes (the OOB error).
sum(prop.table(table(iris$Species)) * c(0, 0.08, 0.10))
[,1]
[1,] 0.06
You can avoid needing to use sum if you use matrix multiplication, which here is equivalent to the dot/scalar/inner product:
prop.table(table(iris$Species)) %*% c(0, 0.08, 0.10)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With