Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between Rscript and littler

...besides the fact that Rscript is invoked with #!/usr/bin/env Rscript and littler with #!/usr/local/bin/r (on my system) in first line of script file. I've found certain differences in execution speed (seems like littler is a bit slower).

I've created two dummy scripts, ran each 1000 times and compared average execution time.

Here's the Rscript file:

#!/usr/bin/env Rscript  btime <- proc.time() x <- rnorm(100) print(x) print(plot(x)) etime <- proc.time() tm <- etime - btime sink(file = "rscript.r.out", append = TRUE) cat(paste(tm[1:3], collapse = ";"), "\n") sink() print(tm) 

and here's the littler file:

#!/usr/local/bin/r  btime <- proc.time() x <- rnorm(100) print(x) print(plot(x)) etime <- proc.time() tm <- etime - btime sink(file = "little.r.out", append = TRUE) cat(paste(tm[1:3], collapse = ";"), "\n") sink() print(tm) 

As you can see, they are almost identical (first line and sink file argument differ). Output is sinked to text file, hence imported in R with read.table. I've created bash script to execute each script 1000 times, then calculated averages.

Here's bash script:

for i in `seq 1000` do ./$1 echo "####################" echo "Iteration #$i" echo "####################" done 

And the results are:

# littler script > mean(lit)     user   system  elapsed  0.489327 0.035458 0.588647  > sapply(lit, median)    L1    L2    L3  0.490 0.036 0.609  # Rscript > mean(rsc)     user   system  elapsed  0.219334 0.008042 0.274017  > sapply(rsc, median)    R1    R2    R3  0.220 0.007 0.258  

Long story short: beside (obvious) execution-time difference, is there some other difference? More important question is: why should/shouldn't you prefer littler over Rscript (or vice versa)?

like image 372
aL3xa Avatar asked Jul 08 '10 15:07

aL3xa


1 Answers

Couple quick comments:

  1. The path /usr/local/bin/r is arbitrary, you can use /usr/bin/env r as well as we do in some examples. As I recall, it limits what other arguments you can give to r as it takes only one when invoked via env

  2. I don't understand your benchmark, and why you'd do it that way. We do have timing comparisons in the sources, see tests/timing.sh and tests/timing2.sh. Maybe you want to split the test between startup and graph creation or whatever you are after.

  3. Whenever we ran those tests, littler won. (It still won when I re-ran those right now.) Which made sense to us because if you look at the sources to Rscript.exe, it works different by setting up the environment and a command string before eventually calling execv(cmd, av). littler can start a little quicker.

  4. The main price is portability. The way littler is built, it won't make it to Windows. Or at least not easily. OTOH we have RInside ported so if someone really wanted to...

  5. Littler came first in September 2006 versus Rscript which came with R 2.5.0 in April 2007.

  6. Rscript is now everywhere where R is. That is a big advantage.

  7. Command-line options are a little more sensible for littler in my view.

  8. Both work with CRAN packages getopt and optparse for option parsing.

So it's a personal preference. I co-wrote littler, learned a lot doing that (eg for RInside) and still find it useful -- so I use it dozens of times each day. It drives CRANberries. It drives cran2deb. Your mileage may, as hey say, vary.

Disclaimer: littler is one of my projects.

Postscriptum: I would have written the test as

I would have written this as

  fun <- function { X <- rnorm(100); print(x); print(plot(x)) }   replicate(N, system.time( fun )["elapsed"]) 

or even

  mean( replicate(N, system.time(fun)["elapsed"]), trim=0.05) 

to get rid of the outliers. Moreover, you only essentially measure I/O (a print, and a plot) which both will get from the R library so I would expect little difference.

like image 125
Dirk Eddelbuettel Avatar answered Sep 28 '22 09:09

Dirk Eddelbuettel