Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parSapply not finding objects in global environment

I am trying to run code on several cores (I tried both the snow and parallel packages). I have

cl <- makeCluster(2)
y  <- 1:10
sapply(1:5, function(x) x + y)  # Works
parSapply(cl, 1:5, function(x) x + y)

The last line returns the error:

Error in checkForRemoteErrors(val) : 
  2 nodes produced errors; first error: object 'y' not found

Clearly parSapply isn't finding y in the global environment. Any ways to get around this? Thanks.

like image 414
Charlie Avatar asked Apr 10 '12 20:04

Charlie


2 Answers

The nodes don't know about the y in the global environment on the master. You need to tell them somehow.

library(parallel)
cl <- makeCluster(2)
y  <- 1:10
# add y to function definition and parSapply call
parSapply(cl, 1:5, function(x,y) x + y, y)
# export y to the global environment of each node
# then call your original code
clusterExport(cl, "y")
parSapply(cl, 1:5, function(x) x + y)
like image 88
Joshua Ulrich Avatar answered Nov 05 '22 09:11

Joshua Ulrich


It is worth mentioning that your example will work if parSapply is called from within a function, although the real issue is where the function function(x) x + y is created. For example, the following code works correctly:

library(parallel)
fun <- function(cl, y) {
  parSapply(cl, 1:5, function(x) x + y)
}
cl <- makeCluster(2)
fun(cl, 1:10)
stopCluster(cl)

This is because functions that are created in other functions are serialized along with the local environment in which they were created, while functions created from the global environment are not serialized along with the global environment. This can be useful at times, but it can also lead to a variety a problems if you're not aware of the issue.

like image 22
Steve Weston Avatar answered Nov 05 '22 10:11

Steve Weston