Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R programming - submitting jobs on a multiple node linux cluster using PBS

I am running R on a multiple node Linux cluster. I would like to run my analysis on R using scripts or batch mode without using parallel computing software such as MPI or snow.

I know this can be done by dividing the input data such that each node runs different parts of the data.

My question is how do I go about this exactly? I am not sure how I should code my scripts. An example would be very helpful!

I have been running my scripts so far using PBS but it only seems to run on one node as R is a single thread program. Hence, I need to figure out how to adjust my code so it distributes labor to all of the nodes.

Here is what I have been doing so far:

1) command line:

> qsub myjobs.pbs

2) myjobs.pbs:

> #!/bin/sh
> #PBS -l nodes=6:ppn=2
> #PBS -l walltime=00:05:00
> #PBS -l arch=x86_64
> 
> pbsdsh -v $PBS_O_WORKDIR/myscript.sh

3) myscript.sh:

#!/bin/sh
cd $PBS_O_WORKDIR
R CMD BATCH --no-save my_script.R

4) my_script.R:

> library(survival)
> ...
> write.table(test,"TESTER.csv",
> sep=",", row.names=F, quote=F)

Any suggestions will be appreciated! Thank you!

-CC

like image 240
CCA Avatar asked Jun 29 '10 21:06

CCA


2 Answers

This is rather a PBS question; I usually make an R script (with Rscript path after #!) and make it gather a parameter (using commandArgs function) that controls which "part of the job" this current instance should make. Because I use multicore a lot I usually have to use only 3-4 nodes, so I just submit few jobs calling this R script with each of a possible control argument values.
On the other hand your use of pbsdsh should do its job... Then the value of PBS_TASKNUM can be used as a control parameter.

like image 75
mbq Avatar answered Sep 28 '22 22:09

mbq


This was an answer to a related question - but it's an answer to the comment above (as well).

For most of our work we do run multiple R sessions in parallel using qsub (instead).

If it is for multiple files I normally do:

while read infile rest
do
qsub -v infile=$infile call_r.pbs 
done < list_of_infiles.txt

call_r.pbs:

...
R --vanilla -f analyse_file.R $infile
...

analyse_file.R:

args <- commandArgs()
infile=args[5]
outfile=paste(infile,".out",sep="")...

Then I combine all the output afterwards...

like image 36
pallevillesen Avatar answered Sep 28 '22 22:09

pallevillesen