Let's say I have n tasks that I would like to run in parallel using the foreach package. The i-th task is defined by f(dataset_i) where f is a function that requires time to complete, depending on the size of dataset_i.
Function f itself can be parallelized, so I would like to allocate cpu_i to the i-th task, and run f(dataset_i) with cpu_i cpus.
Is something like this possible, and if so, how to do it?
R level parallelisim is always process parallelism. Tasks are not sent to
different CPU cores, but rather separate R sessions running on different
threads. Importantly however, all the R sessions still have access to all
cores. So if you have a function f() that can perform a task on multiple
threads (= separate cores) by calling to native code, you should specify
when launching the task how many threads f() should use.
Concretely, with f = data.table::fwrite:
library(foreach)
library(doParallel)
#> Loading required package: iterators
#> Loading required package: parallel
registerDoParallel()
datasets <- list(
data.frame(foo = rnorm(1e7)),
data.frame(foo = rnorm(2e7))
)
foreach(data = datasets, ncpu = c(1, 2)) %dopar% {
file <- withr::local_tempfile()
data.table::fwrite(data, file, nThread = ncpu) |> system.time()
}
#> [[1]]
#> user system elapsed
#> 0.72 0.06 0.85
#>
#> [[2]]
#> user system elapsed
#> 1.31 0.14 0.78
stopImplicitCluster()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With