Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

risks of using setwd() in a script?

Tags:

r

setwd

I've heard it said that it is bad practice to use setwd() in a script.

  • What are the risks/dangers associated with it?
  • What are better alternatives?
like image 358
Ricardo Saporta Avatar asked Dec 07 '12 20:12

Ricardo Saporta


People also ask

What does the function Setwd () do?

setwd returns the current directory before the change, invisibly and with the same conventions as getwd . It will give an error if it does not succeed (including if it is not implemented).

Why is Setwd bad?

What's wrong with setwd() ? The chance of the setwd() command having the desired effect – making the file paths work – for anyone besides its author is 0%. It's also unlikely to work for the author one or two years or computers from now. The project is not self-contained and portable.

Which function is used to check on which directory the R workspace is pointing Setwd () Setws () Getwd () Getws ()?

R is always pointed at a directory on your computer. You can find out which directory by running the getwd (get working directory) function; this function has no arguments. To change your working directory, use setwd and specify the path to the desired folder.

What is the use of working directory in R?

The working directory is just a file path on your computer that sets the default location of any files you read into R, or save out of R. In other words, a working directory is like a little flag somewhere on your computer which is tied to a specific analysis project.


1 Answers

It's an issue of reproducible code. If you specify a directory that doesn't exist on someone else's computer, then they can't use your code. This is particularly bad with absolute file paths, and particularly bad with Windows file paths (which are absolutely impossible to replicate on a Unix system).

My preferred solution is to specify that the user should be in the relevant directory on their own system before starting to run the code. If for your own convenience you want to put a setwd(...) right at the top of your code, where other people can notice it and comment it out as appropriate, but the rest of your code assumes only relative paths from that starting directory, that's OK with me.

Yihui Xie (author of knitr) feels particularly strongly about this:

https://groups.google.com/forum/?fromgroups=#!topic/knitr/knM0VWoexT0

Whenever you want to manipulate files, they are assumed to be under the same directory of your source (e.g. Rnw documents). Then you can always use relative paths and you will never need to setwd(). Using setwd() contradicts with the principle of reproducibility, e.g. you use setwd('foo/bar/') and the directory may not exist in other people's computers. See FAQ 7: https://github.com/yihui/knitr/blob/master/FAQ.md

And from the aforementioned FAQ 7:

You'd better not do this [change working directory inside knitr code chunks]. Your working directory is always getwd() (all output files will be written here), but the code chunks are evaluated under the directory where your input document comes from. Changing working directories while running R code is a bad practice in general. See #38 for a discussion. You should also try to avoid absolute directories whenever possible (use relative directories instead), because it makes things less reproducible.

See also: https://github.com/yihui/knitr/issues/38

like image 112
Ben Bolker Avatar answered Sep 23 '22 13:09

Ben Bolker