Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Incorporating bash scripts into an R package?

Tags:

bash

r

workflow

Background

I am writing an R package to support reproducible research. At this point, the workflow is mostly held together by bash scripts, and I can run an analysis by sending a single command like ./runscript.sh. I use bash for the following:

  1. file manipulation tar, rsync, 'rename'
  2. running bash files locally and via ssh
  3. running R scripts using R --vanilla that in turn call R functions
  4. find and replace text within files using sed
  5. submitting jobs via qsub

It seems to me that it would be much more efficient (cleaner and easier) to execute the entire workflow from an R function or R script. I am partial to R since I am more familiar with it and mostly work within emacs ESS.

Questions

  1. Would it be worthwhile to encapsulate all of these uses of bash within R using the system and files functions?

  2. Are there other R packages that I have not yet found that would be helpful for doing this?

Notes

Following Al3xa's answer, I realize that it is important to note that the speed penalty of using eg. R vs bash versions of tar and gsub on 1000-2000 files would likely be less than the current rate limiting steps in the workflow: computations by JAGS (~10-20min) and FORTRAN (>4hrs)

like image 871
David LeBauer Avatar asked Feb 23 '11 17:02

David LeBauer


2 Answers

I'm a big fan of using R as your "integrated" environment vs. bash scripts. I'm in the process of moving all of my bash and ruby scripts to Rscript as I need to make changes to them.

There are only a couple of reasons not to move everything into R that come to mind. I'm referring mainly to using Rscript to accomplish this

1) Speed, which from my testing is a moderate impact in any situation I've come across, and would be trivial relative to the times you mentioned.

2) Portability, in that paths to Rscript, etc. may be different across systems. I've had no problems writing things on OS X and moving them to a Linux server, but might break on Windows.

The advantages in my book are:

1) Much easier for me to write. I don't have to switch back and forth between the slight idiosyncrasies with things like conditional statements and for loops.

2) More forgiving. I can't describe how much time I've spent trying to get bash scripts to work because I accidentally had a space where I shouldn't have. R is much nicer in that regard (yes, of course, we should all follow conventions in R perfectly, but I'd rather that it not stall me up for hours if I don't).

3) I do better work. For tar a file it doesn't matter, but I find I do better text manipulation in R vs. awk/sed for example.

Re: packages that are helpful -- This doesn't exist, to my knowledge, but I'd love a version of make that's based on R. make's syntax is one of the most inflexible out there (tabs vs spaces? really?) - I'd love to write an R-based alternative. Some day, I will...

like image 67
Noah Avatar answered Oct 06 '22 23:10

Noah


Well, there are functions like tar, gsub etc. Anyway, I guess you're willing to create a crossplatform solution. You should prefer bash for the sake of speed, and use R only for R-specific functions. I don't find it useful to wrap all system-based commands within system and/or file.*... it would be much slower... If you're using Linux, I suggest littler over Rscript interface.

like image 36
aL3xa Avatar answered Oct 06 '22 22:10

aL3xa