I'm new to using the parallel packages and have started exploring them in a bid to speed up some of my work. An annoyance I often encounter is that the foreach
command will throw up problems when I have not clusterExport
the relevant functions/variables.
I know that the example below does not necessarily need foreach
to make it fast, but for illustration sake, I'll use it.
library(doParallel)
library(parallel)
library(lubridate)
library(foreach)
cl <- makeCluster(c("localhost", "localhost", "localhost","localhost"), type = "SOCK")
registerDoParallel(cl, cores = 4)
Dates <- sample(c(dates = format(seq(ISOdate(2010,1,1), by='day', length=365), format='%d-%m-%Y')), 500, replace = TRUE)
foreach(i = seq_along(Dates), .combine = rbind) %dopar% dmy(Dates[i])
Error in dmy(Dates[i]) : task 1 failed - "could not find function "dmy""
As you can see, there is an error that says that the dmy
function is not found. I then have to go on and add the following:
clusterExport(cl, c("dmy"))
So my question is, besides looking at the error for clues on what to export, is there a more elegant way of knowing beforehand what objects to export or is there a way to share the global environment with all the slaves before running the foreach
?
No need to export individual package functions manually like that. You can use the .packages
argument to the foreach
function to load the required packages, so all package functions will be available to your %dopar%
expression.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With