Parallel within parallel

Question

Let's say I have n tasks that I would like to run in parallel using the foreach package. The i-th task is defined by f(dataset_i) where f is a function that requires time to complete, depending on the size of dataset_i.

Function f itself can be parallelized, so I would like to allocate cpu_i to the i-th task, and run f(dataset_i) with cpu_i cpus.

Is something like this possible, and if so, how to do it?

Mikko Marttila · Accepted Answer

R level parallelisim is always process parallelism. Tasks are not sent to different CPU cores, but rather separate R sessions running on different threads. Importantly however, all the R sessions still have access to all cores. So if you have a function f() that can perform a task on multiple threads (= separate cores) by calling to native code, you should specify when launching the task how many threads f() should use.

Concretely, with f = data.table::fwrite:

library(foreach)
library(doParallel)
#> Loading required package: iterators
#> Loading required package: parallel

registerDoParallel()

datasets <- list(
  data.frame(foo = rnorm(1e7)),
  data.frame(foo = rnorm(2e7))
)

foreach(data = datasets, ncpu = c(1, 2)) %dopar% {
  file <- withr::local_tempfile()
  data.table::fwrite(data, file, nThread = ncpu) |> system.time()
}
#> [[1]]
#>    user  system elapsed 
#>    0.72    0.06    0.85 
#> 
#> [[2]]
#>    user  system elapsed 
#>    1.31    0.14    0.78

stopImplicitCluster()

Parallel within parallel

Tags:

foreach

r

parallel-processing

det

1 Answers

Mikko Marttila

Recent Activity

Donate For Us

Parallel within parallel

Tags:

foreach

r

parallel-processing

det

1 Answers

Mikko Marttila

Related questions

Recent Activity

Donate For Us