I am working on a project that requires parallel processing in R, and I am new to the doparallel package. What I would like to do is use a parallelized foreach loop. Due to the nature of the problem, this foreach loop will need to be executed many times. The problem I am having is that I use cppfunction and cfunction within the loop.
The current work around is to call clusterEvalQ() for the cluster and to compile the relevant functions. However, this is extremely slow (~10 seconds for 4 cores). I have included the relevant code below. Is there any way to speed this up? Thanks.
clusterEvalQ(cl, {
library("inline")
library("Rcpp")
source("C_functions.R")
})
Yes, there is a way to speed it up by taking the compilation hit only once.
In particular, move all the compiled code into an R package. From there, install the R package onto the cluster and, then, load the package. Inside the parallel code, call the function in the package.
This is required because C++ functions imported into R are session-specific. As a result, each session requires its own compilation. The compilation is the "expensive" part.
Also, do not use the inline package. Instead, you should use Rcpp Attributes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With