Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange environment behavior in parallel plyr

Recently, I have created an object factor=1 in my workspace, not knowing that there is a function factor in the base package.

What I intended to do was to use the variable factor within a parallel loop, e.g.,

library(plyr)
library(foreach)
library(doParallel)

workers <- makeCluster(2)
registerDoParallel(workers,cores=2)

factor=1

llply(
  as.list(1:2),
  function(x) factor*x,
  .parallel = TRUE,
  .paropts=list(.export=c("factor"))
     )

This, however, results in an error that took me so time to understand. As it seems, plyr creates the object factor in its environemt exportEnv, but uses base::factor instead of the user provided object. See the following example

llply(
  as.list(1:2),
  function(x) {
    function_env=environment();
    global_env=parent.env(function_env);
    export_env=parent.env(global_env);
    list(
      function_env=function_env,
      global_env=global_env,
      export_env=export_env,
      objects_in_exportenv=unlist(ls(envir=export_env)),
      factor_found_in_envs=find("factor"),
      factor_in_exportenv=get("factor",envir=export_env)
      )
    },
  .parallel = TRUE,
  .paropts=list(.export=c("factor"))
  )

stopCluster(workers)

If we inspects the output of llply, we see that the line factor_in_exportenv=get("factor",envir=export_env) does not return 1 (corresponding to the user-provided object) but the function definition of base::factor.

Question 1) How can I understand this behavior? I would have expected the output to be 1.

Question 2) Is there a way to get a warning from R if I assign a new value to an object that was already defined in another package (such in my case factor)?

like image 489
cryo111 Avatar asked Jul 24 '13 16:07

cryo111


1 Answers

The llply function calls "foreach" under the hood. Foreach uses "parant.frame()" to determine the environment to evaluate. What is the parant.frame in llply's case? It is the llply's function environment, which doesn't have factor defined.

Instead of using llply, why not use foreach directly?

library(plyr)
library(foreach)
library(doParallel)

workers <- makeCluster(2)
registerDoParallel(workers,cores=2)

factor=1
foreach(x=1:2) %dopar% {factor*x}

Note, you don't even need the .export parameter, since it automatically does so in this case.

like image 159
thc Avatar answered Oct 03 '22 09:10

thc