I am trying to use a high performance cluster at my institution for the first time and I have hit a problem that I can't resolve.
The following code returns an error:
ptime<-system.time({ r <- foreach(z = 1:length(files),.combine=cbind) %dopar% { raster <- raster(paste(folder,files[1],sep="")) data<-getValues(raster) clp <- na.omit(data) for(i in 1:length(classes)){ results[i,z]<-length(clp[clp==classes[i]])/length(clp) print(z) } } }) Error in { : task 1 failed - "could not find function "raster""
A also tried a different foreach code for another task I have:
r <- foreach (i=1:length(poly)) %dopar% { clip<-gIntersection(paths,poly[i,]) lgth<-gLength(clip) vid<-poly@data[i,3] path.lgth[i,] <- c(vid,lgth) print(i) }
and this time the gIntersection function isn't found. Obviously the packages are all installed and loaded. After reading some forum posts it seem it has to do with the environment that the functions execute/operate in.
Can someone please help? I'm not a programmer!
Thank you!
Update:
I have adjusted my code for the solution provided:
results<-matrix(nrow=length(classes),ncol=length(files)) dimnames(results)[[1]]<-classes dimnames(results)[[2]]<-files ptime<-system.time({ foreach(z = 1:length(files),.packages="raster") %dopar% { raster <- raster(paste(folder,files[z],sep="")) data<-getValues(raster) clp <- na.omit(data) for(i in 1:length(classes)){ results[i,z]<-length(clp[clp==classes[i]])/length(clp) print(z) } } })
But what I get is an output (my results matrix) filled with na's. As you can see I create a matrix object called results to fill with results (which works with for loops), but after reading the documentation for foreach it seems that you save your results differently with this function.
And advice on what I should choose for the .combine argument?
In the vignette of foreach and the help page of foreach, the argument .packages
is pointed out as necessary to provide when using parallel computation with functions that are not loaded by default. So your code in the first example should be:
ptime<-system.time({ r <- foreach(z = 1:length(files), .combine=cbind, .packages='raster') %dopar% { # some code # and more code } })
Some more explanation
The foreach
package does a lot of setting up behind the scenes. What happens is the following (in principle, technical details are a tad more complicated):
foreach
sets up a system of "workers" that you can see as separate R sessions that are each committed to a different core in a cluster.
The function that needs to be carried out is loaded into each "worker" session, together with the objects needed to carry out the function
each worker calculates the result for a subset of the data
The results of the calculation on the different workers is put together and reported in the "master" R session.
As the workers can be seen as separate R sessions, packages from the "master" session are not automatically loaded. You have to specify which packages should be loaded in those worker sessions, and that's what the .package
argument of foreach
is used for.
Note that when you use other packages (e.g. parallel
or snowfall
), you'll have to set up these workers explicitly, and also take care of passing objects and loading packages on the worker sessions.
I dealt with the same problem. My solution is
Function.R
f <- function(parameters...){Body...}
MainFile.R
library(foreach) library(doParallel) cores=detectCores() cl <- makeCluster(cores[1]-2) #not to overload your computer registerDoParallel(cl) clusterEvalQ(cl, .libPaths("C:/R/win-library/4.0")) #Give your R library path output <- foreach(i=1:5, .combine = rbind) %dopar% { source("~/Function.R") # That is the main point. Source your Function File here. temp <- f(parameters...) # use your custom function after sourcing temp } stopCluster(cl)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With