Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I ensure a consistent R environment among different users on the same server?

Tags:

r

environment

I am writing a protocol for a reproducible analysis using an in-house package "MyPKG". Each user will supply their own input files; other than the inputs, the analyses should be run under the same conditions. (e.g. so that we can infer that different results are due to different input files).

MyPKG is under development, so library(MyPKG) will load whichever was the last version that the user compiled in their local library. It will also load any dependencies found in their local libraries.

But I want everyone to use a specific version (MyPKG_3.14) for this analysis while still allowing development of more recent versions. If I understand correctly, "R --vanilla" will load the same dependencies for everyone.

Once we are done, we will save the working environment as a VM to maintain a stable reproducible environment. So a temporary (6 month) solution will suffice.

I have come up with two potential solutions, but am not sure if either is sufficient.

  1. ask the server admin to install MyPKG_3.14 into the default R path and then provide the following code in the protocol:

    R --vanilla
    library(MyPKG)
    ....
    

    or

  2. compile MyPKG_3.14 in a specific library, e.g. lib.loc = "/home/share/lib/R/MyPKG_3.14", and then provide

    R --vanilla
    library(MyPKG)
    

  • Are both of these approaches sufficient to ensure that everyone is running the same version?
  • Is one preferable to the other?
  • Are there other unforseen issues that may arise?
  • Is there a preferred option for standardising the multiple analyses?
  • Should I include a test of the output of SessionInfo()?
  • Would it be better to create a single account on the server for everyone to use?
like image 278
David LeBauer Avatar asked Nov 12 '22 22:11

David LeBauer


1 Answers

Couple of points:

  • Use system-wide installations of packages, e.g. the Debian / Ubuntu binary for R (incl the CRAN ports) will try to use /usr/local/lib/R/site-library (which users can install too if added to group owning the directory). That way everybody gets the same version
  • Use system-wide configuration, e.g. prefer $R_HOME/etc/ over the dotfiles below ~/. For the same reason, the Debian / Ubuntu package offers softlinks in /etc/R/
  • Use R's facilties to query its packages (eg installed.packages()) to report packages and versions.
  • Use, where available, OS-level facilities to query OS release and version. This, however, is less standardized.

Regarding the last point my box at home says

> edd@max:~$ lsb_release -a | tail -4
> Distributor ID: Ubuntu
> Description:    Ubuntu 12.04.1 LTS
> Release:        12.04
> Codename:       precise
> edd@max:~$ 

which is a start.

like image 200
Dirk Eddelbuettel Avatar answered Nov 15 '22 11:11

Dirk Eddelbuettel