I've got access to a big, powerful cluster. I'm a halfway decent R programmer, but totally new to shell commands (and terminal commands in general besides basic things that one needs to do to use ubuntu).
I want to use this cluster to run a bunch of parallel processes in R, and then I want to combine them. Specifically, I have a problem analogous to:
my.function <-function(data,otherdata,N){
mod = lm(y~x, data=data)
a = predict(mod,newdata = otherdata,se.fit=TRUE)
b = rnorm(N,a$fit,a$se.fit)
b
}
r1 = my.function
r2 = my.function
r3 = my.function
r4 = my.function
...
r1000 = my.function
results = list(r1,r2,r3,r4, ... r1000)
The above is just a dumb example, but basically I want to do something 1000 times in parallel, and then do something with all of the results from the 1000 processes.
How do I submit 1000 jobs simultaneously to the cluster, and then combine all the results, like in the last line of the code?
Any recommendations for well-written manuals/references for me to go RTFM with would be welcome as well. Unfortunately, the documents that I've found aren't particularly intelligible.
Thanks in advance!
You can combine plyr
with doMC
package (that is a parallel backend to the foreach
package) as follows:
require(plyr)
require(doMC)
registerDoMC(20) # for 20 processors
llply(1:1000, function(idx) {
out <- my.function(.)
}, .parallel = TRUE)
Edit: If you're talking about submitting simultaneous jobs, then don't you have a LSF license? You can then use bsub
to submit as many jobs as you need and it also takes care of load-balancing and what not...!
Edit 2: A small note on load-balancing (example using LSF's bsub
):
What you mention is something similar to what I wrote here => LSF
. You can submit jobs
in batches. For ex: using in LSF
you can use bsub
to submit a job to the cluster like so:
bsub -m <nodes> -q <queue> -n <processors> -o <output.log>
-e <error.log> Rscript myscript.R
and this will place you on the queue and allocate for you the number of processors (if and when available) your job will start running (depending on resources). You can pause
, restart
, suspend
your jobs.. and much much more.. qsub
is something similar to this concept. The learning curve maybe a bit steep, but it is worth it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With