How can I run multiple independent and unrelated functions in parallel without larger code do-over?

Question

I've been searching around the internet, trying to understand parallel processing.

What they all seem to assume is that I have some kind of loop function operating on e.g. every Nth row of a data set divided among N cores and combined afterwards, and I'm pointed towards a lot of parallelized apply() functions.

(Warning, ugly code below)

My situation though is that I have is on form

tempJob <- myFunction(filepath, string.arg1, string.arg2)

where the path is a file location, and the string arguments are various ways of sorting my data.

My current workflow is simply amassing a lot of

tempjob1 <- myFunction(args)
tempjob2 <- myFunction(other args)
...
tempjobN <- myFunction(some other args here)

# Make a list of all temporary outputs in the global environment
temp.list <- lapply(ls(pattern = "temp"), get)

# Stack them all
df <- rbindlist(temp.list)

# Remove all variables from workspace matching "temp"
rm(list=ls(pattern="temp"))

These jobs are entirely independent, and could in principle be run in 8 separate instances of R (although that would be a bother to manage I guess). How can I separate the first 8 jobs out to 8 cores, and whenever a core finishes its job and returns a treated dataset to the global environment it'll simply take whichever job is next in line.

HenrikB · Accepted Answer

With the future package (I'm the author) you can achieve what you want with a minor modification to your code - use "future" assignments %<-% instead of regular assignments <- for the code you want to run asynchronously.

library("future")
plan(multisession)

tempjob1 %<-% myFunction(args)
tempjob2 %<-% myFunction(other args)
...
tempjobN %<-% myFunction(some other args here)

temp.list <- lapply(ls(pattern = "temp"), get)

EDIT 2022-01-04: plan(multiprocess) -> plan(multisession) since multiprocess is deprecated and will eventually be removed.

How can I run multiple independent and unrelated functions in parallel without larger code do-over?

Tags:

r

parallel-processing

komodovaran_

1 Answers

HenrikB

Recent Activity

Donate For Us

How can I run multiple independent and unrelated functions in parallel without larger code do-over?

Tags:

r

parallel-processing

komodovaran_

1 Answers

HenrikB

Related questions

Recent Activity

Donate For Us