Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

clusterExport, environment and variable scoping

I wrote a function in which I define variables and load objects. Here's a simplified version:

fn1 <- function(x) {
  load("data.RData") # a vector named "data"
  source("myFunctions.R")
  library(raster)
  library(rgdal)

  a <- 1
  b <- 2
  r1 <- raster(ncol = 10, nrow = 10)
  r1 <- init(r1, fun = runif)
  r2 <- r1 * 100
  names(r1) <- "raster1"
  names(r2) <- "raster2"
  m <- stack(r1, r2) # basically, a list of two rasters in which it is possible to access a raster by its name, like this: m[["raster1"]]

  c <- fn2(m)
}

Function "fn2" is can be found in "myFunctions.R" and is defined as:

fn2 <- function(x) {
  fn3 <- function(y) {
   x[[y]] * 100 * data
  }

  cl <- makeSOCKcluster(8)   
  clusterExport(cl, list("x"), envir = environment()) 
  clusterExport(cl, list("a", "b", "data")) 
  clusterEvalQ(cl, c(library(raster), library(rgdal), rasterOptions(maxmemory = a, chunksize = b))) 
  f <- parLapply(cl, names(x), fn3)  
  stopCluster(cl)
}

Now, when I run fn1, I get an error like this:

Error in get(name, envir = envir) : object 'a' not found

From what I understand from ?clusterExport, the default value for envir is .GlobalEnv, so I would assume that "a" and "b" would be accessible to fn2. However, it doesn't seem to be the case. How can I access the environment to which "a" and "b" belong?

So far, the only solution I have found is to pass "a" and "b" as arguments to fn2. Is there a way to use these two variables in fn2 without passing them as arguments?

Thanks a lot for your help.

like image 722
Guilôme Avatar asked Nov 25 '13 23:11

Guilôme


1 Answers

You're getting the error when calling clusterExport(cl, list("a", "b", "data")) because clusterExport is trying to find the variables in .GlobalEnv, but fn1 isn't setting them in .GlobalEnv but in its own local environment.

An alternative is to pass the local environment of fn1 to fn2, and specify that environment to clusterExport. The call to fn2 would be:

c <- fn2(m, environment())

If the arguments to fn2 are function(x, env), then the call to clusterExport would be:

clusterExport(cl, list("a", "b", "data"), envir = env)

Since environments are passed by reference, there should be no performance problem doing this.

like image 199
Steve Weston Avatar answered Oct 04 '22 04:10

Steve Weston