Knowing what objects to clusterExport beforehand

Question

I'm new to using the parallel packages and have started exploring them in a bid to speed up some of my work. An annoyance I often encounter is that the foreach command will throw up problems when I have not clusterExport the relevant functions/variables.

Example

I know that the example below does not necessarily need foreach to make it fast, but for illustration sake, I'll use it.

library(doParallel)
library(parallel)
library(lubridate)
library(foreach)

cl <- makeCluster(c("localhost", "localhost", "localhost","localhost"), type = "SOCK")
registerDoParallel(cl, cores = 4)

Dates <- sample(c(dates = format(seq(ISOdate(2010,1,1), by='day', length=365), format='%d-%m-%Y')), 500, replace = TRUE)

foreach(i = seq_along(Dates), .combine = rbind) %dopar% dmy(Dates[i])

Error in dmy(Dates[i]) : task 1 failed - "could not find function "dmy""

As you can see, there is an error that says that the dmy function is not found. I then have to go on and add the following:

clusterExport(cl, c("dmy"))

So my question is, besides looking at the error for clues on what to export, is there a more elegant way of knowing beforehand what objects to export or is there a way to share the global environment with all the slaves before running the foreach?

Joshua Ulrich · Accepted Answer

No need to export individual package functions manually like that. You can use the .packages argument to the foreach function to load the required packages, so all package functions will be available to your %dopar% expression.

Knowing what objects to clusterExport beforehand

Tags:

foreach

r

parallel-processing

Example

R J

1 Answers

Joshua Ulrich

Recent Activity

Donate For Us

Knowing what objects to clusterExport beforehand

Tags:

foreach

r

parallel-processing

Example

R J

1 Answers

Joshua Ulrich

Related questions

Recent Activity

Donate For Us