I want to parallelize the model fitting process for xgboost while using caret. From what I have seen in xgboost's documentation, the nthread
parameter controls the number of threads to use while fitting the models, in the sense of, building the trees in a parallel way. Caret's train
function will perform parallelization in the sense of, for example, running a process for each iteration in a k-fold CV. Is this understanding correct, if yes, is it better to:
doMC
package and the registerDoMC
function), set nthread=1
via caret's train function so it passes that parameter to xgboost, set allowParallel=TRUE
in trainControl
, and let caret
handle the parallelization for the cross-validation; orallowParallel=FALSE
and no parallel back-end registration) and set nthread
to the number of physical cores, so the parallelization is contained exclusively within xgboost.Or is there no "better" way to perform the parallelization?
Edit: I ran the code suggested by @topepo, with tuneLength = 10
and search="random"
, and specifying nthread=1
on the last line (otherwise I understand that xgboost will use multithreading). There are the results I got:
xgb_par[3]
elapsed
283.691
just_seq[3]
elapsed
276.704
mc_par[3]
elapsed
89.074
just_seq[3]/mc_par[3]
elapsed
3.106451
just_seq[3]/xgb_par[3]
elapsed
0.9753711
xgb_par[3]/mc_par[3]
elapsed
3.184891
At the end, it turned out that both for my data and for this test case, letting caret handle the parallelization was a better choice in terms of runtime.
The XGBoost library for gradient boosting uses is designed for efficient multi-core parallel processing. This allows it to efficiently use all of the CPU cores in your system when training.
c) XGBoost: XGBoost is an implementation of GBM, with major improvements. GBM's build trees sequentially, but XGBoost is parallelized. This makes XGBoost faster.
By Harish AmatyaPosted in Questions & Answers 2 years ago. 0. "On larger datasets where runtime is a consideration, you can use parallelism to build your models faster. It's common to set the parameter n_jobs equal to the number of cores on your machine."
It is not simple to project what the best strategy would be. My (biased) thought is that you should parallelize the process that takes the longest. Here, that would be the resampling loop since an open thread/worker would invoke the model many times. The opposite approach of parallelizing the model fit will start and stop workers repeatedly and theoretically slows things down. Your mileage may vary.
I don't have OpenMP installed but there is code below to test (if you could report your results, that would be helpful).
library(caret)
library(plyr)
library(xgboost)
library(doMC)
foo <- function(...) {
set.seed(2)
mod <- train(Class ~ ., data = dat,
method = "xgbTree", tuneLength = 50,
..., trControl = trainControl(search = "random"))
invisible(mod)
}
set.seed(1)
dat <- twoClassSim(1000)
just_seq <- system.time(foo())
## I don't have OpenMP installed
xgb_par <- system.time(foo(nthread = 5))
registerDoMC(cores=5)
mc_par <- system.time(foo())
My results (without OpenMP)
> just_seq[3]
elapsed
326.422
> xgb_par[3]
elapsed
319.862
> mc_par[3]
elapsed
102.329
>
> ## Speedups
> xgb_par[3]/mc_par[3]
elapsed
3.12582
> just_seq[3]/mc_par[3]
elapsed
3.189927
> just_seq[3]/xgb_par[3]
elapsed
1.020509
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With