Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel within parallel

Let's say I have n tasks that I would like to run in parallel using the foreach package. The i-th task is defined by f(dataset_i) where f is a function that requires time to complete, depending on the size of dataset_i.

Function f itself can be parallelized, so I would like to allocate cpu_i to the i-th task, and run f(dataset_i) with cpu_i cpus.

Is something like this possible, and if so, how to do it?

like image 387
det Avatar asked May 02 '26 01:05

det


1 Answers

R level parallelisim is always process parallelism. Tasks are not sent to different CPU cores, but rather separate R sessions running on different threads. Importantly however, all the R sessions still have access to all cores. So if you have a function f() that can perform a task on multiple threads (= separate cores) by calling to native code, you should specify when launching the task how many threads f() should use.

Concretely, with f = data.table::fwrite:

library(foreach)
library(doParallel)
#> Loading required package: iterators
#> Loading required package: parallel

registerDoParallel()

datasets <- list(
  data.frame(foo = rnorm(1e7)),
  data.frame(foo = rnorm(2e7))
)

foreach(data = datasets, ncpu = c(1, 2)) %dopar% {
  file <- withr::local_tempfile()
  data.table::fwrite(data, file, nThread = ncpu) |> system.time()
}
#> [[1]]
#>    user  system elapsed 
#>    0.72    0.06    0.85 
#> 
#> [[2]]
#>    user  system elapsed 
#>    1.31    0.14    0.78

stopImplicitCluster()
like image 122
Mikko Marttila Avatar answered May 03 '26 13:05

Mikko Marttila



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!