I want to parallelize the model fitting process for xgboost while using caret. From what I have seen in xgboost's documentation, the <code>nthread</code> parameter controls the number of threads to use while fitting the models, in the sense of, building the trees in a parallel way. Caret's <code>train</code> function will perform parallelization in the sense of, for example, running a process for each iteration in a k-fold CV. Is this understanding correct, if yes, is it better to: <ol> <li>Register the number of cores (for example, with the <code>doMC</code> package and the <code>registerDoMC</code> function), set <code>nthread=1</code> via caret's train function so it passes that parameter to xgboost, set <code>allowParallel=TRUE</code> in <code>trainControl</code>, and let <code>caret</code> handle the parallelization for the cross-validation; or</li> <li>Disable caret parallelization (<code>allowParallel=FALSE</code> and no parallel back-end registration) and set <code>nthread</code> to the number of physical cores, so the parallelization is contained exclusively within xgboost.</li> </ol> Or is there no "better" way to perform the parallelization? Edit: I ran the code suggested by @topepo, with <code>tuneLength = 10</code> and <code>search="random"</code>, and specifying <code>nthread=1</code> on the last line (otherwise I understand that xgboost will use multithreading). There are the results I got: <pre class="prettyprint"><code>xgb_par[3] elapsed 283.691 just_seq[3] elapsed 276.704 mc_par[3] elapsed 89.074 just_seq[3]/mc_par[3] elapsed 3.106451 just_seq[3]/xgb_par[3] elapsed 0.9753711 xgb_par[3]/mc_par[3] elapsed 3.184891 </code></pre> At the end, it turned out that both for my data and for this test case, letting caret handle the parallelization was a better choice in terms of runtime.

It is not simple to project what the best strategy would be. My (biased) thought is that you should parallelize the process that takes the longest. Here, that would be the resampling loop since an open thread/worker would invoke the model many times. The opposite approach of parallelizing the model fit will start and stop workers repeatedly and theoretically slows things down. Your mileage may vary. I don't have OpenMP installed but there is code below to test (if you could report your results, that would be helpful). <pre class="prettyprint"><code>library(caret) library(plyr) library(xgboost) library(doMC) foo <- function(...) { set.seed(2) mod <- train(Class ~ ., data = dat, method = "xgbTree", tuneLength = 50, ..., trControl = trainControl(search = "random")) invisible(mod) } set.seed(1) dat <- twoClassSim(1000) just_seq <- system.time(foo()) ## I don't have OpenMP installed xgb_par <- system.time(foo(nthread = 5)) registerDoMC(cores=5) mc_par <- system.time(foo()) </code></pre> My results (without OpenMP) <pre class="prettyprint"><code>> just_seq[3] elapsed 326.422 > xgb_par[3] elapsed 319.862 > mc_par[3] elapsed 102.329 > > ## Speedups > xgb_par[3]/mc_par[3] elapsed 3.12582 > just_seq[3]/mc_par[3] elapsed 3.189927 > just_seq[3]/xgb_par[3] elapsed 1.020509 </code></pre>

Parallel processing with xgboost and caret

Tags:

r

xgboost

r-caret

I want to parallelize the model fitting process for xgboost while using caret. From what I have seen in xgboost's documentation, the nthread parameter controls the number of threads to use while fitting the models, in the sense of, building the trees in a parallel way. Caret's train function will perform parallelization in the sense of, for example, running a process for each iteration in a k-fold CV. Is this understanding correct, if yes, is it better to:

Register the number of cores (for example, with the doMC package and the registerDoMC function), set nthread=1 via caret's train function so it passes that parameter to xgboost, set allowParallel=TRUE in trainControl, and let caret handle the parallelization for the cross-validation; or
Disable caret parallelization (allowParallel=FALSE and no parallel back-end registration) and set nthread to the number of physical cores, so the parallelization is contained exclusively within xgboost.

Or is there no "better" way to perform the parallelization?

Edit: I ran the code suggested by @topepo, with tuneLength = 10 and search="random", and specifying nthread=1 on the last line (otherwise I understand that xgboost will use multithreading). There are the results I got:

xgb_par[3]
elapsed  
283.691 
just_seq[3]
elapsed 
276.704 
mc_par[3]
elapsed 
89.074 
just_seq[3]/mc_par[3]
elapsed 
3.106451 
just_seq[3]/xgb_par[3]
elapsed 
0.9753711 
xgb_par[3]/mc_par[3]
elapsed 
3.184891

At the end, it turned out that both for my data and for this test case, letting caret handle the parallelization was a better choice in terms of runtime.

514

asked Sep 16 '16 09:09

drgxfs

1 Answers

It is not simple to project what the best strategy would be. My (biased) thought is that you should parallelize the process that takes the longest. Here, that would be the resampling loop since an open thread/worker would invoke the model many times. The opposite approach of parallelizing the model fit will start and stop workers repeatedly and theoretically slows things down. Your mileage may vary.

I don't have OpenMP installed but there is code below to test (if you could report your results, that would be helpful).

library(caret)
library(plyr)
library(xgboost)
library(doMC)

foo <- function(...) {
  set.seed(2)
  mod <- train(Class ~ ., data = dat, 
               method = "xgbTree", tuneLength = 50,
               ..., trControl = trainControl(search = "random"))
  invisible(mod)
}

set.seed(1)
dat <- twoClassSim(1000)

just_seq <- system.time(foo())


## I don't have OpenMP installed
xgb_par <- system.time(foo(nthread = 5))

registerDoMC(cores=5)
mc_par <- system.time(foo())

My results (without OpenMP)

> just_seq[3]
elapsed 
326.422 
> xgb_par[3]
elapsed 
319.862 
> mc_par[3]
elapsed 
102.329 
> 
> ## Speedups
> xgb_par[3]/mc_par[3]
elapsed 
3.12582 
> just_seq[3]/mc_par[3]
 elapsed 
3.189927 
> just_seq[3]/xgb_par[3]
 elapsed 
1.020509

answered Oct 21 '22 09:10

topepo

Related questions
                            
                                Optimization with Constraints
                            
                                R package dependencies
                            
                                How can I get R's lapply (and mclapply) to restore the state of the random number generator?
                            
                                R Cannot allocate memory though memory seems to be available
                            
                                Change R indentation style in vim with Vim-R-plugin
                            
                                passing column names to data.table programmatically
                            
                                Multiple Separators for the same file input R
                            
                                Read a zipped .csv file in R
                            
                                Changing select input choices in r Shiny
                            
                                Exceeded maximum number of DLLs in R
                            
                                R: a cat of many colors
                            
                                Topological data analysis - where to begin
                            
                                Why do some primitives have byte-codes and some do not?
                            
                                Difference between ls() and objects()
                            
                                Why does R store the loop variable/index/dummy in memory?
                            
                                Text Justification in Rmarkdown word document
                            
                                How to solve gaps and island problems in R and performance vs SQL?
                            
                                store arrangeGrob to object, does not create printable object
                            
                                force a regular plot object into a Grob for use in grid.arrange
                            
                                Count combinations of length 2 per id

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With