I am trying to use the parallel package in R to send four different function calls to four different processors but am really lost as to how to assign different cores to do different work. I've read through the documentation for the parallel package, doParallel, Rmpi, and foreach in R. I've seen many posts using mclapply for calling different functions with the same argument. I'd like to call the same function with different arguments.
This is pseudocode of what I'd like to accomplish:
BEGIN parallel (core)
if(core == 1)
foo(5, 4, 1/2, 3, "a")
if(core == 2)
foo(5, 3, 1/3, 1, "b")
if(core == 3)
foo(5, 4, 1/4, 1, "c")
if(core == 4)
foo(5, 2, 1/5, 0, "d")
END parallel
This seems to be a perfect application to parallel computing since these four separate function calls can act independently to solve the problem I am working on. I don't know how to do this in R though.
There are various packages in R which allow parallelization. “parallel” Package The parallel package in R can perform tasks in parallel by providing the ability to allocate cores to R. The working involves finding the number of cores in the system and allocating all of them or a subset to make a cluster.
The parallel library, which comes with R as of version 2.14. 0, provides the mclapply() function which is a drop-in replacement for lapply. The "mc" stands for "multicore," and as you might gather, this function distributes the lapply tasks across multiple CPU cores to be executed in parallel.
The parallel package which comes with your R installation. It represents a combining of two historical packages–the multicore and snow packages, and the functions in parallel have overlapping names with those older packages.
If you are on a single host, a very effective way to make use of these extra cores is to use several R instances at the same time. The operating system will indeed always assign a different core to each new R instance. In Linux, just open several the terminal windows. Then within each terminal, type R to open R.
You could use the clusterApply function from the parallel package:
library(parallel)
cl <- makeCluster(4)
clusterExport(cl, "foo")
cores <- seq_along(cl)
r <- clusterApply(cl[cores], cores, function(core) {
if (core == 1) {
foo(5, 4, 1/2, 3, "a")
} else if (core == 2) {
foo(5, 3, 1/3, 1, "b")
} else if (core == 3) {
foo(5, 4, 1/4, 1, "c")
} else if (core == 4) {
foo(5, 2, 1/5, 0, "d")
}
})
This is very similar to your pseudocode and demonstrates how you can direct particular tasks to particular cluster workers using clusterApply. Note that by changing the value of cores
, you can execute on any subset of the cluster workers that you choose.
If a "core ID" isn't really important, you can pass different arguments to the function by iterating over vectors for each of the arguments using the foreach package:
library(doParallel)
registerDoParallel(cl)
r2 <- foreach(a1=c(5,5,5,5), a2=c(4,3,4,2), a3=c(1/2,1/3,1/4,1/5),
a4=c(3,1,1,0), a5=c("a","b","c","d")) %dopar% {
foo(a1, a2, a3, a4, a5)
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With