Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caret train rf model - how long it takes to execute big data?

My data has 500000 observations and 7 variables. I split the data, 80% as training data and 20% test data. I used caret to train the model. Codes are below.I started it and it was taking so much time and eventually I had to stop it. Just wondering is there anything wrong in my model or it usually takes long time for big data? Any suggestion?

library(caret)
set.seed(130000000)

classifier_rf <- train(y=train$active,
                       x=train[3:5],
                       data=train,
                       method='rf',
                       trControl=trainControl(method='repeatedcv',
                                              number=10,
                                              repeats=10))
like image 392
I. Ara Avatar asked Sep 17 '25 14:09

I. Ara


2 Answers

Your best bet is probably to try parallelizing the process. For a useful resource click here.

like image 86
Sibs Avatar answered Sep 20 '25 06:09

Sibs


From my understanding, caret still uses RandomForest function underneath, plus the cross validation/grid search part, so it would take a while.

For random forest model specifically, I usually just use ranger package, and it's so much faster. You can find their manual here.

like image 42
timxymo1225 Avatar answered Sep 20 '25 04:09

timxymo1225