While trying to train random forest model with caret package, I noticed that execution time is inexplicably long:
> set.seed = 1;
> n = 500;
> m = 30;
> x = matrix(rnorm(n * m), nrow = n);
> y = factor(sample.int(2, n, replace = T), labels = c("yes", "no"))
> require(caret);
> require(randomForest);
> print(system.time({rf <- randomForest(x, y);}));
user system elapsed
0.99 0.00 0.98
> print(system.time({rfmod <- train(x = x, y = y,
+ method = "rf",
+ metric = "Accuracy",
+ trControl = trainControl(classProbs = T)
+ );}));
user system elapsed
95.83 0.71 97.26
It seemed to me that execution should only be 10 times longer, since by default 10-fold cross-validation happens instead of a single run. I am not tuning any parameters but it seems that train does it automatically:
> rfmod$results
mtry Accuracy Kappa AccuracySD KappaSD
1 2 0.4736669 -0.04437013 0.03323485 0.06493845
2 16 0.4818095 -0.03241901 0.03279341 0.06426745
3 30 0.4878361 -0.02149108 0.02956972 0.05936881
That would explain at most 30 times difference. However, it runs almost 100 times longer. What could be the possible explanation?
Thanks in advance
You are not specifying method
in trainControl
so it defaults to 30 iterations of the bootstrap and, since tuneLength
was also not set, you are doing it over 3 values of mtry
.
A 99.2449-fold speedup should not be unexpected when you multiply the computational costs by 90-fold.
Max
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With