Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

load does not work with foreach and %dopar%

Tags:

foreach

r

I encountered some issues with using foreach %dopar% when loading objects from disk into memory... Objects are not loaded when I try to load them when using foreach %dopar% (it works when I use only %do%) Below is a simple example that shows my problem.

envir = .GlobalEnv

x <- "X test"
y <- "Y test"
z <- "Z test"

save(x, file="x.RData")
save(y, file="y.RData")
save(z, file="z.RData")

rm(x)
rm(y)
rm(z)

objectsNamesVector <- c("x", "y", "z")

foreach(i=1:length(objectsNamesVector), .combine=function (...) NULL,    .multicombine=TRUE) %do% {
    print(paste("Loading object ", objectsNamesVector[i]," - ", i, " of ",    length(objectsNamesVector), sep=""))
    load(file=paste(objectsNamesVector[i], ".RData", sep=""), envir=envir)
}

print(x)
print(y)
print(z)

rm(x)
rm(y)
rm(z)

foreach(i=1:length(objectsNamesVector), .combine=function (...) NULL, .multicombine=TRUE) %dopar% {
    print(paste("Loading object ", objectsNamesVector[i]," - ", i, " of ", length(objectsNamesVector), sep=""))
    load(file=paste(objectsNamesVector[i], ".RData", sep=""), envir=envir)
}

print(x)
print(y)
print(z)

Result of executing this code is(without the ">"):

envir = .GlobalEnv

x <- "X test"
y <- "Y test"
z <- "Z test"

save(x, file="x.RData")
save(y, file="y.RData")
save(z, file="z.RData")

rm(x)
rm(y)
rm(z)

objectsNamesVector <- c("x", "y", "z")

foreach(i=1:length(objectsNamesVector), .combine=function (...) NULL,    .multicombine=TRUE) %do% {
+   print(paste("Loading object ", objectsNamesVector[i]," - ", i, " of ", length(objectsNamesVector), sep=""))
+   load(file=paste(objectsNamesVector[i], ".RData", sep=""), envir=envir)
+ }
[1] "Loading object x - 1 of 3"
[1] "Loading object y - 2 of 3"
[1] "Loading object z - 3 of 3"
NULL

print(x)
[1] "X test"
print(y)
[1] "Y test"
print(z)
[1] "Z test"
rm(x)
rm(y)
rm(z)

foreach(i=1:length(objectsNamesVector), .combine=function (...) NULL, .multicombine=TRUE) %dopar% {
+   print(paste("Loading object ", objectsNamesVector[i]," - ", i, " of ", length(objectsNamesVector), sep=""))
+   load(file=paste(objectsNamesVector[i], ".RData", sep=""), envir=envir)
+ }
NULL

print(x)
Error in print(x) : object 'x' not found
print(y)
Error in print(y) : object 'y' not found
print(z)
Error in print(z) : object 'z' not found

I understand that I cannot improve IO with foreach since IO is sequential on my architecture. I would just like to understand why this is not working...

Thank you for your answer.

Regards, Samo.

like image 402
user859821 Avatar asked Jul 24 '11 00:07

user859821


3 Answers

I believe the issue is that the %do% is able to write to the global environment, while the %dopar% is not. Using %do% is very useful if you want the foreach() syntax and other goodies, but do not need a parallel backend.

Also, as the %do% is done in sequence, keeping the global environment clean can be left to the user, as there won't be race conditions. In the parallel case, you can have race conditions (i.e. some parallel tasks may finish before others, and can create random, hard-to-reproduce outcomes).

Because of race conditions, it's not a good idea to have this kind of operation directly write to the global environment, if you can avoid it. A later user may take such sequential code and replace %do% with %dopar%, hoping to get faster results, but not get the same results. To your credit, you've found a clean example of where that can occur.

like image 165
Iterator Avatar answered Oct 05 '22 20:10

Iterator


It's difficult to tell exactly what's going on without knowing:

  1. What your operating system is.
  2. What parallel backend you've registered to %dopar%

If you're using doMC, then the code within the foreach block executes within a fork()'ed process. This means that it has its own memory space, and while it will modify .GlobalEnv locally, it will not modify it within the "master" process. That is, you end up modifying a copy of .GlobalEnv.

If you execute this code with no backend registered, it executes "correctly," because %dopar% ends up executing as %do% does.

One way to handle this situation might be to load objects into new environments, and then use foreach()'s .combine parameter to copy the contents of each of them into .GlobalEnv.

like image 41
evanrsparks Avatar answered Oct 05 '22 21:10

evanrsparks


I had the same problem when I tried to used "foreach" + "doSnow" to run a parallel program on a 32 cores computer. "foreach" stopped working and said: an OBJECT NOT FOUND! I did use the ".export" in "foreach" to include that external object, but it still said the OBJECT NOT FOUND! When I tried "doParallel" NOT "doSnow", it worked!

external_object <- 1

library(foreach)
library(doParallel)
registerDoParallel(cores=32)
getDoParWorkers()

foreach(i=1:32, .combine=c .multicombine=TRUE, .export=c("external_object")) %dopar% { external_object }

like image 30
Eric Avatar answered Oct 05 '22 20:10

Eric