Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Knowing what objects to clusterExport beforehand

I'm new to using the parallel packages and have started exploring them in a bid to speed up some of my work. An annoyance I often encounter is that the foreach command will throw up problems when I have not clusterExport the relevant functions/variables.

Example

I know that the example below does not necessarily need foreach to make it fast, but for illustration sake, I'll use it.

library(doParallel)
library(parallel)
library(lubridate)
library(foreach)

cl <- makeCluster(c("localhost", "localhost", "localhost","localhost"), type = "SOCK")
registerDoParallel(cl, cores = 4)

Dates <- sample(c(dates = format(seq(ISOdate(2010,1,1), by='day', length=365), format='%d-%m-%Y')), 500, replace = TRUE)

foreach(i = seq_along(Dates), .combine = rbind) %dopar% dmy(Dates[i])

Error in dmy(Dates[i]) : task 1 failed - "could not find function "dmy""

As you can see, there is an error that says that the dmy function is not found. I then have to go on and add the following:

clusterExport(cl, c("dmy"))

So my question is, besides looking at the error for clues on what to export, is there a more elegant way of knowing beforehand what objects to export or is there a way to share the global environment with all the slaves before running the foreach?

like image 438
R J Avatar asked May 21 '12 15:05

R J


1 Answers

No need to export individual package functions manually like that. You can use the .packages argument to the foreach function to load the required packages, so all package functions will be available to your %dopar% expression.

like image 82
Joshua Ulrich Avatar answered Oct 29 '22 05:10

Joshua Ulrich