Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallelization in R: how to "source" on every node?

I have created parallel workers (all running on the same machine) using:

MyCluster = makeCluster(8)

How can I make every of these 8 nodes source an R-file I wrote? I tried:

clusterCall(MyCluster, source, "myFile.R")
clusterCall(MyCluster, 'source("myFile.R")')

And several similar versions. But none worked. Can you please help me to find the mistake?

Thank you very much!

like image 861
Bernd Avatar asked Feb 05 '14 17:02

Bernd


2 Answers

The following code serves your purpose:

library(parallel)

cl <- makeCluster(4)
clusterCall(cl, function() { source("test.R") })

## do some parallel work

stopCluster(cl)

Also you can use clusterEvalQ() to do the same thing:

library(parallel)

cl <- makeCluster(4)
clusterEvalQ(cl, source("test.R"))

## do some parallel work

stopCluster(cl)

However, there is subtle difference between the two methods. clusterCall() runs a function on each node while clusterEvalQ() evaluates an expression on each node. If you have a variable list of files to source, clusterCall() will be easier to use since clusterEvalQ(cl,expr) will regard any expr as an expression so it's not convenient to put a variable there.

like image 198
Kun Ren Avatar answered Nov 10 '22 03:11

Kun Ren


If you use a command to source a local file, ensure the file is there.

Else place the file on a network share or NFS, and source the absolute path.

Better still, and standard answers, write a package and have that package installed on each node and then just call library() or require().

like image 2
Dirk Eddelbuettel Avatar answered Nov 10 '22 04:11

Dirk Eddelbuettel