Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Variables in function arguments do not pass to cluster when parallel computing

I am having difficulties understanding how variables are scoped/passed to the functions when interacting with the parallel package

library(parallel)

test <- function(a = 1){
  no_cores <- detectCores()-1
  clust <- makeCluster(no_cores)
  result <- parSapply(clust, 1:10, function(x){a + x})
  stopCluster(clust)
  return(result)
}

test()
[1]  4  5  6  7  8  9 10 11 12 13

x = 1
test(x)

Error in checkForRemoteErrors(val) : 
3 nodes produced errors; first error: object 'x' not found

test() works but test(x) doesn't. When I modify the function as follows, it works.

test <- function(a = 1){
  no_cores <- detectCores()-1
  clust <- makeCluster(no_cores)
  y = a
  result <- parSapply(clust, 1:10, function(x){y + x})
  stopCluster(clust)
  return(result)
}

x = 1
test(x)

Can someone explain what is going on in memory?

like image 835
hjw Avatar asked Jun 06 '26 19:06

hjw


1 Answers

This is due to lazy evaluation. The argument a is not evaluated in the function call untill its first use. In first case, the cluster does not known a since it has not been evaluated in the parent environment. You can fix it by forcing the evaluation:

test <- function(a = 1){
    no_cores <- detectCores()-1
    clust <- makeCluster(no_cores)
    force(a)    # <------------------------
    result <- parSapply(clust, 1:10, function(x){a + x})
    stopCluster(clust)
    return(result)
}

x = 1
test(x)
#  [1]  2  3  4  5  6  7  8  9 10 11
like image 54
mt1022 Avatar answered Jun 09 '26 10:06

mt1022



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!