Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Rcpp within parallel code via snow to make a cluster

Tags:

r

rcpp

snow

I've written a function in Rcpp and compiled it with inline. Now, I want to run it in parallel on different cores, but I'm getting a strange error. Here's a minimal example, where the function funCPP1 can be compiled and runs well by itself, but cannot be called by snow's clusterCall function. The function runs well as a single process, but gives the following error when ran in parallel:

Error in checkForRemoteErrors(lapply(cl, recvResult)) : 
  2 nodes produced errors; first error: NULL value passed as symbol address

And here is some code:

## Load and compile
library(inline)
library(Rcpp)
library(snow)
src1 <- '
     Rcpp::NumericMatrix xbem(xbe);
     int nrows = xbem.nrow();
     Rcpp::NumericVector gv(g);
     for (int i = 1; i < nrows; i++) {
      xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
     }
     return xbem;
'
funCPP1 <- cxxfunction(signature(xbe = "numeric", g="numeric"),body = src1, plugin="Rcpp")

## Single process
A <- matrix(rnorm(400), 20,20)
funCPP1(A, 0.5)

## Parallel
cl <- makeCluster(2, type = "SOCK") 
clusterExport(cl, 'funCPP1') 
clusterCall(cl, funCPP1, A, 0.5)
like image 274
Vincent Avatar asked May 20 '11 15:05

Vincent


2 Answers

Think it through -- what does inline do? It creates a C/C++ function for you, then compiles and links it into a dynamically-loadable shared library. Where does that one sit? In R's temp directory.

So you tried the right thing by shipping the R frontend calling that shared library to the other process (which has another temp directory !!), but that does not get the dll / so file there.

Hence the advice is to create a local package, install it and have both snow processes load and call it.

(And as always: better quality answers may be had on the rcpp-devel list which is read by more Rcpp constributors than SO is.)

like image 198
Dirk Eddelbuettel Avatar answered Oct 04 '22 01:10

Dirk Eddelbuettel


Old question, but I stumbled across it while looking through the top Rcpp tags so maybe this answer will be of use still.

I think Dirk's answer is proper when the code you've written is fully de-bugged and does what you want, but it can be a hassle to write a new package for such as small piece of code like in the example. What you can do instead is export the code block, export a "helper" function that compiles source code and run the helper. That'll make the CXX function available, then use another helper function to call it. For instance:

# Snow must still be installed, but this functionality is now in "parallel" which ships with base r.
library(parallel)

# Keep your source as an object
src1 <- '
     Rcpp::NumericMatrix xbem(xbe);
     int nrows = xbem.nrow();
     Rcpp::NumericVector gv(g);
     for (int i = 1; i < nrows; i++) {
      xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
     }
     return xbem;
'
# Save the signature
sig <- signature(xbe = "numeric", g="numeric")

# make a function that compiles the source, then assigns the compiled function 
# to the global environment
c.inline <- function(name, sig, src){
    library(Rcpp)
    funCXX <- inline::cxxfunction(sig = sig, body = src, plugin="Rcpp")
    assign(name, funCXX, envir=.GlobalEnv)
}
# and the function which retrieves and calls this newly-compiled function 
c.namecall <- function(name,...){
    funCXX <- get(name)
    funCXX(...)
}

# Keep your example matrix
A <- matrix(rnorm(400), 20,20)

# What are we calling the compiled funciton?
fxname <- "TestCXX"

## Parallel
cl <- makeCluster(2, type = "PSOCK") 

# Export all the pieces
clusterExport(cl, c("src1","c.inline","A","fxname")) 

# Call the compiler function
clusterCall(cl, c.inline, name=fxname, sig=sig, src=src1)

# Notice how the function now named "TestCXX" is available in the environment
# of every node?
clusterCall(cl, ls, envir=.GlobalEnv)

# Call the function through our wrapper
clusterCall(cl, c.namecall, name=fxname, A, 0.5)
# Works with my testing

I've written a package ctools (shameless self-promotion) which wraps up a lot of the functionality that is in the parallel and Rhpc packages for cluster computing, both with PSOCK and MPI. I already have a function called "c.sourceCpp" which calls "Rcpp::sourceCpp" on every node in much the same way as above. I'm going to add in a "c.inlineCpp" which does the above now that I see the usefulness of it.

Edit:

In light of Coatless' comments, the Rcpp::cppFunction() in fact negates the need for the c.inline helper here, though the c.namecall is still needed.

src2 <- '
 NumericMatrix TestCpp(NumericMatrix xbe, int g){
        NumericMatrix xbem(xbe);
        int nrows = xbem.nrow();
        NumericVector gv(g);
        for (int i = 1; i < nrows; i++) {
            xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
        }
        return xbem;
 }
'

clusterCall(cl, Rcpp::cppFunction, code=src2, env=.GlobalEnv)

# Call the function through our wrapper
clusterCall(cl, c.namecall, name="TestCpp", A, 0.5)
like image 40
Brian Albert Monroe Avatar answered Oct 04 '22 01:10

Brian Albert Monroe