Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use setwd in a relative way?

Tags:

git

r

Our team uses R scripts in git repos that are shared between several people, across both Mac and Windows (and occasionally Linux) machines. This tends to lead to a bunch of really annoying lines at the top of scripts that look like this:

#path <- 'C:/data-work/project-a/data'
#path <- 'D:/my-stuff/project-a/data'
path = "~/projects/project-a/data"
#path = 'N:/work-projects/project-a/data'
#path <- "/work/project-a/data"
setwd(path)

To run the script, we have to comment/uncomment the correct path variable or the scripts won't run. This is annoying, untidy, and tends to be a bit of a mess in the commit history too.

In past I've got round this by using shell scripts to set directories relative to the script's location and skipping setwd entirely (and then using ./run-scripts.sh instead of Rscript process.R), but as we've got Windows users here, that won't work. Is there a better way to simplify these messy setwd() boilerplates in R?

(side note: in Python, I solve this by using the path library to get the location of the script file itself, and then build relative paths from that. But R doesn't seem to have a way to get the location of the running script's file?)

like image 243
futuraprime Avatar asked Jun 17 '19 10:06

futuraprime


3 Answers

The answer is to not use setwd() at all, ever. R does things a bit different than Python, for sure, but this is one thing they have in common.

Instead, any scripts you're executing should assume they're being run from a common, top-level, root folder. When you launch a new R process, its working directory (i.e., what getwd() gives) is set to the same folder as the process was spawned from.

As an example, if you had this layout:

.
├── data
│   └── mydata.csv
└── scripts
    └── analysis.R

You would run analysis.R from . and analysis.R would reference data/mydata.csv as "data/mydata.csv" (e.g., read.csv("data/mydata.csv, stringsAsFactors = FALSE)). I would keep your shell scripts or Makefiles that run your R scripts and have the R scripts assume they're being run from the top level of the git repo.

This might look like:

cd . # Whereever `.` above is
Rscript scripts/analysis.R

Further reading:

  • https://www.tidyverse.org/articles/2017/12/workflow-vs-script/
  • https://github.com/jennybc/here_here
like image 136
amoeba Avatar answered Oct 22 '22 14:10

amoeba


1) If you are looking for a way to find the path of the currently running script then see:

Rscript: Determine path of the executing script

2) Another approach is to require that users put an option of a prearranged name in their .Rprofile file. Then the script can setwd to that. An attractive aspect of this system is that over time one can forget where various projects are located and with this system one can just look at the .Rprofile file to remind oneself. For example, for projectA each person running the project would put this in their .Rprofile

options(projectA = "...whatever...")

and then the script would start off with:

proj <- getOption("projectA")
if (!is.null(proj)) setwd(proj) else stop("Set option 'projectA' to its directory")

One variation of this is to assume the current directory if projectA is not defined. Although this may seem to be more flexible I personally find the documenting feature of the above code to be a big advantage.

proj <- getOption("projectA")
if (!is.null(proj)) setwd(proj) else cat("Using", getwd(), "\n")
like image 31
G. Grothendieck Avatar answered Oct 22 '22 12:10

G. Grothendieck


in Python, I solve this by using the path library to get the location of the script file itself, and then build relative paths from that. But R doesn't seem to have a way to get the location of the running script's file?

R itself unfortunately doesn’t have a way for this. But you can achieve the same result in either of two ways:

  • Use packages instead of scripts where you include code via source. Then you can use the solution outlined in amoeba’s answer. This works because the real issue is that R has no way of telling the source function where to look for scripts.
  • Use box::use instead of source. The ‘box’ package provides a module system that allows relative imports of code modules. A nice side-effect of this is that the package provides a function that tells you the path of the current script, just like in Python (and, just like in Python, you normally don’t need to use this function directly).
like image 31
Konrad Rudolph Avatar answered Oct 22 '22 13:10

Konrad Rudolph