Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to export objects to parallel clusters within a function in R

I am writing a function to combine and organize data then run MCMC chains in parallel using the parallel function in base R. My function is below.

dm100zip <- function(y, n.burn = 1, n.it = 3000, n.thin = 1) {
  y <- array(c(as.matrix(y[,2:9]), as.matrix(y[ ,10:17])), c(length(y$Plot), 8, 2))
  nplots <- nrow(y)
  ncap1 <- apply(y[,1:8, 1],1,sum)
  ncap2 <- apply(y[,1:8, 2],1,sum)
  ncap <- as.matrix(cbind(ncap1, ncap2))
  ymax1 <- apply(y[,1:8, 1],1,sum)
  ymax2 <- apply(y[,1:8, 2],1,sum)

  # Bundle data for JAGS/BUGS
  jdata100 <- list(y=y, nplots=nplots, ncap=ncap)

  # Set initial values for Gibbs sampler
  inits100 <- function(){
    list(p0=runif(1, 1.1, 2),
      p.precip=runif(1, 0, 0.1),
      p.day = runif(1, -.5, 0.1))
  }

  # Set parameters of interest to monitor and save
  params100 <- c("N", "p0")

  # Run JAGS in parallel for improved speed
  CL <- makeCluster(3) # set number of clusters = to number of desired chains
  clusterExport(cl=CL, list("jdata100", "params100", "inits100", "ymax1", "ymax2", "n.burn", "jag", "n.thin")) # make data available to jags in diff cores
  clusterSetRNGStream(cl = CL, iseed = 5312)

  out <- clusterEvalQ(CL, {
    library(rjags)
    load.module('glm')
    jm <- jags.model("dm100zip.txt", jdata100, inits100, n.adapt = n.burn, n.chains = 1)
    fm <- coda.samples(jm, params100, n.iter = n.it, thin = n.thin)
    return(as.mcmc(fm))

  })

  out.list <- mcmc.list(out) # group output from each core into one list
  stopCluster(CL)

  return(out.list)
}

When I run the function I get an error that n.burn, n.it, and n.thin are not found for use in the clusterExport function. For example,

dm100zip.list.nain <- dm100zip(NAIN, n.burn = 1, n.it = 3000, n.thin = 1) # returns error

If I set values for each of them before running the function, then it uses those values and runs fine. For example,

n.burn = 1
n.it = 1000
n.thin = 1
dm100zip.list.nain <- dm100zip(NAIN, n.burn = 1, n.it = 3000, n.thin = 1) 

This runs fine but uses n.it = 1000 not 3000

Can someone help with why the objects in the global environment are used by the ClusterExport function but not the values assigned by the function that ClusterExport is run within? Is there a way around this?

like image 465
djhocking Avatar asked Mar 30 '14 03:03

djhocking


2 Answers

By default, clusterExport looks for the variables specified by "varlist" in the global environment. In your case, it should look in the local environment of the dm100zip function. To make it do that, you use the clusterExport "envir" argument:

clusterExport(cl=CL, list("jdata100", "params100", "inits100", "ymax1",
                          "ymax2", "n.burn", "jag", "n.thin"),
              envir=environment())

Note that variables in "varlist" that are defined in the global environment will also be found, but values defined in dm100zip will take precedence.

like image 179
Steve Weston Avatar answered Nov 01 '22 21:11

Steve Weston


Since function arguments in R are processed with lazy evaluation, you need to ensure that any default arguments actually exist in the function's execution environment. In fact, the R core authors included the force function for this purpose, which is simply function(x) x and forces the conversion of the argument from a promise into an evaluated expression. Try making the following modification:

dm100zip <- function(y, n.burn = 1, n.it = 3000, n.thin = 1) {
  force(n.burn); force(n.it); force(n.thin)
  # The rest of your code as above...
}

For a more detailed explanation of these issues, consult the Lazy Evaluation section of Hadley's treatment of functions.

like image 3
Robert Krzyzanowski Avatar answered Nov 01 '22 21:11

Robert Krzyzanowski