Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

caret train rf model - inexplicably long execution

While trying to train random forest model with caret package, I noticed that execution time is inexplicably long:

> set.seed = 1;
> n = 500;
> m = 30;
> x = matrix(rnorm(n * m), nrow = n);
> y = factor(sample.int(2, n, replace = T), labels = c("yes", "no"))
> require(caret);
> require(randomForest);
> print(system.time({rf <- randomForest(x, y);}));
   user  system elapsed 
   0.99    0.00    0.98 
> print(system.time({rfmod <- train(x = x, y = y,
+                method = "rf",
+                metric = "Accuracy",
+                trControl = trainControl(classProbs = T)
+ );}));
   user  system elapsed 
  95.83    0.71   97.26 

It seemed to me that execution should only be 10 times longer, since by default 10-fold cross-validation happens instead of a single run. I am not tuning any parameters but it seems that train does it automatically:

> rfmod$results
  mtry  Accuracy       Kappa AccuracySD    KappaSD
1    2 0.4736669 -0.04437013 0.03323485 0.06493845
2   16 0.4818095 -0.03241901 0.03279341 0.06426745
3   30 0.4878361 -0.02149108 0.02956972 0.05936881

That would explain at most 30 times difference. However, it runs almost 100 times longer. What could be the possible explanation?

Thanks in advance

like image 717
maksay Avatar asked Dec 26 '22 06:12

maksay


1 Answers

You are not specifying method in trainControl so it defaults to 30 iterations of the bootstrap and, since tuneLength was also not set, you are doing it over 3 values of mtry.

A 99.2449-fold speedup should not be unexpected when you multiply the computational costs by 90-fold.

Max

like image 184
topepo Avatar answered Jan 11 '23 02:01

topepo