Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to clusterExport a function without its evaluation environment

I am trying to use parLapply inside of another function not defined in the global environment. The worker function makes use of a list of other functions that I want to clusterExport beforehand, which are also not defined in the global environment. My problem is that both functions export their evaluation environments to the clusters, which are huge and not needed.

Let us call the the worker function workerFunction and the function list functionList.

    workerFunction <- function(i) {
        intermediateOutput <- functionList[[i]](y)
        result <- otherCalculations(intermediateOutput)
        return(result)    
    }

    library(parallel)
    cl <- makeCluster(detectCores())
    environment(workerFunction) <- .GlobalEnv
    environment(functionList) <- .GlobalEnv
    clusterExport(cl, varlist=c("functionList", "y"), envir=.GlobalEnv)
    output <- parLapply(cl, inputVector, workerFunction)

I get:

Error in get(name, envir = envir) (from <text>#53) : object 'functionList' not found

If I don´t set environment(functionList) <- .GlobalEnv, then the huge enclosing environment of functionList is exported to the clusters. Why can´t R find functionList in the global environment?

like image 430
Ronert Avatar asked Jul 01 '13 10:07

Ronert


1 Answers

It's hard to guess the problem without a complete example, but I'm wondering if the error message isn't coming from clusterExport, rather than parLapply. That would happen if functionList was defined in a function rather than the global environment, since the clusterExport envir argument specifies the environment from which to export the variables.

To export variables defined in a function, from that same function, you would use:

clusterExport(cl, varlist=c("functionList", "y"), envir=environment())

I'm just guessing this might be a problem for you since I don't know how or where you defined functionList. Note that clusterExport always assigns the variables to the global environment of the cluster workers.

I'm also suspicious of the way that you are apparently setting the environment of a list: that seems to be legal, but I don't think it will change the environment of functions in that list. In fact, I suspect that exporting functions to the workers in a list may have other problems that you haven't encountered yet. I would use something like this:

mainFunction <- function(cl) {
    fa <- function(x) fb(x)
    fb <- function(x) fc(x)
    fc <- function(x) x
    y <- 7
    workerFunction <- function(i) {
        do.call(functionNames[[i]], list(y))
    }
    environment(workerFunction) <- .GlobalEnv
    environment(fa) <- .GlobalEnv
    environment(fb) <- .GlobalEnv
    environment(fc) <- .GlobalEnv
    functionNames <- c("fa", "fb", "fc")
    clusterExport(cl, varlist=c("functionNames", functionNames, "y"),
                  envir=environment())
    parLapply(cl, seq_along(functionNames), workerFunction)
}

library(parallel)
cl <- makeCluster(detectCores())
mainFunction(cl)
stopCluster(cl)

Note that I've taken liberties with your example, so I'm not sure how well this corresponds with your problem.

like image 106
Steve Weston Avatar answered Nov 06 '22 09:11

Steve Weston