I am trying to run code on several cores (I tried both the snow
and parallel
packages). I have
cl <- makeCluster(2)
y <- 1:10
sapply(1:5, function(x) x + y) # Works
parSapply(cl, 1:5, function(x) x + y)
The last line returns the error:
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: object 'y' not found
Clearly parSapply
isn't finding y
in the global environment. Any ways to get around this? Thanks.
The nodes don't know about the y
in the global environment on the master. You need to tell them somehow.
library(parallel)
cl <- makeCluster(2)
y <- 1:10
# add y to function definition and parSapply call
parSapply(cl, 1:5, function(x,y) x + y, y)
# export y to the global environment of each node
# then call your original code
clusterExport(cl, "y")
parSapply(cl, 1:5, function(x) x + y)
It is worth mentioning that your example will work if parSapply
is called from within a function, although the real issue is where the function function(x) x + y
is created. For example, the following code works correctly:
library(parallel)
fun <- function(cl, y) {
parSapply(cl, 1:5, function(x) x + y)
}
cl <- makeCluster(2)
fun(cl, 1:10)
stopCluster(cl)
This is because functions that are created in other functions are serialized along with the local environment in which they were created, while functions created from the global environment are not serialized along with the global environment. This can be useful at times, but it can also lead to a variety a problems if you're not aware of the issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With