Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there any guidelines for when reproducible code should be included into a publication?

Tags:

r

Given the stress toward reproducible science, I was wondering if my recent work warrants the inclusion of example code in the publication. The datasets that I am using are quite big, so it wouldn't make sense to publish those necessarity - However, the statistical methods that I apply within R are not generally known to my audience (although I would think that they should be).

I'm using empirical orthogonal function analysis (EOF) and generalized additive models (GAM) within my analysis. GAM, in particular, is widely used in ecological studies, but less so within the physical sciences - my work spans both disciplines.

I definitely refer to the R packages that I use, and it wouldn't really be difficult for a reviewer / reader to look for those references (and included examples) themselves. So, my question is, what situations are most appropriate for the inclusion of reproducible code in a publication?

like image 703
Marc in the box Avatar asked Jun 21 '12 12:06

Marc in the box


2 Answers

Code is the most accurate representation of what you actually did. Therefore, in my view you should always aim to publish code alongside your article.

However, editor resistance to this is pretty strong. The fear is that if the reviewer had access to the code, then the journal looks pretty bad if a substantive coding mistake is later found. This is not a hypothetical fear, given the Levitt paper, etc.

Knuth has some strong views on literate programming that you should be able to cite as justification. If you can't convince the journal to accept your code as an integral piece of the publication, consider publishing it on your personal website (the approach taken by e.g. Raj Chetty for many of his papers) or publish it as an R package.

Finally, here's a note I wrote to my programming students:

Consider publishing your code. Doing so will act as a commitment device which will encourage good habits--habits that make your own work easier. Publishing your code also makes it easier for others to extend your analysis, which can result in more citations of your work. Releasing your code is good academic practice as well: it is the truest testament to your analysis. And offering your program to the world shows off the beautiful coding skills which you are about to acquire.

like image 65
Ari B. Friedman Avatar answered Sep 28 '22 02:09

Ari B. Friedman


A basic tenet of science is reproducibility. So the answer would be to "include" code required to conduct your analysis to every paper/publication that is based on data analysis.

I say "include" because you don't need to put the R code directly into the paper. Many if not most journals allow supplementary material which is an option. Alternative, supply your script to one of the many Science data archiving sites (Such as Figshare) and then (and here is the killer!) cite your own script using the DOI that Figshare gives to your deposited script. If you can post the data too, then all the better; Figshare doesn't really care too much about big data sets.

The above applies to code where you are using other packages and your R script does things like loads and formats data, calls functions from other packages and then plots or displays output/results. If you have developed new R code to implement a particular method then I would say package the code as an R package and submit that to CRAN or r-forge or something like that.

From your description, the former (deposit the analysis script in a repo) would be most appropriate.

like image 29
Gavin Simpson Avatar answered Sep 28 '22 02:09

Gavin Simpson