Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Submit jobs to a slave node from within an R script?

To get myscript.R to run on a cluster slave node using a job scheduler (specifically, PBS)

Currently, I submit an R script to a slave node using the following command

qsub -S /bin/bash -p -1 -cwd -pe mpich 1 -j y -o output.log ./myscript.R

Are there functions in R that would allow me to run myscript.R on the head node and send individual tasks to the slave nodes? Something like:

foreach(i=c('file1.csv', 'file2.csv', pbsoptions = list()) %do% read.csv(i)

Update: alternative solution to the qsub command is to remove #/usr/bin/Rscript from the first line of myscript.R and call it directly, as pointed out by @Josh

qsub -S /usr/bin/Rscript -p -1 -cwd -pe mpich 1 -j y -o output.log myscript.R
like image 305
David LeBauer Avatar asked Oct 30 '12 15:10

David LeBauer


2 Answers

If you want to submit jobs from within an R script, I suggest that you look at the "BatchJobs" package. Here is a quote from the DESCRIPTION file:

Provides Map, Reduce and Filter variants to generate jobs on batch computing systems like PBS/Torque, LSF, SLURM and Sun Grid Engine.

BatchJobs appears to be more sophisticated than previous, similar packages, such as Rsge and Rlsf. There are functions for registering, submitting, and retrieving the results of jobs. Here's a simple example:

library(BatchJobs)
reg <- makeRegistry(id='test')
batchMap(reg, sqrt, x=1:10)
submitJobs(reg)
y <- loadResults(reg)

You need to configure BatchJobs to use your batch queueing system. The submitJobs "resource" argument can be used to request appropriate resources for the jobs.

This approach is very useful if your cluster doesn't allow very long running jobs, or if it severely restricts the number of long running jobs. BatchJobs allows you to get around those restrictions by breaking up your work into multiple jobs while hiding most of the work associated with doing that manually.

Documentation and examples are available at the project website.

like image 190
Steve Weston Avatar answered Sep 28 '22 02:09

Steve Weston


For most of our work we do run multiple R sessions in parallel using qsub (instead).

If it is for multiple files I normally do:

while read infile rest
do
qsub -v infile=$infile call_r.sh 
done < list_of_infiles.txt

call_r.sh:

...
R --vanilla -f analyse_file.R $infile
...

analyse_file.R:

args <- commandArgs()
infile=args[5]
outfile=paste(infile,".out",sep="")...

Then I combine all the output afterwards...

like image 43
pallevillesen Avatar answered Sep 28 '22 02:09

pallevillesen