Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Proofreading the PDF of a book, thesis, or report derived from a large multi-file Sweave project

Tags:

r

sweave

I'm a big fan of reproducible research. I often use make, Sweave, LaTeX, and R to produce large research reports (i.e., lots of Sexpr() commands and heaps of graphs and tables).

Obviously, R CMD Sweave identifies certain errors in the R code chunks at compilation. But the resulting PDF can still contain undesirable results. I have a few strategies for proofreading such documents, but I was interested in learning from others on SO.

Questions:

  1. Does anyone have any tips or tricks related to proofreading and quality control when it comes to producing PDFs based on large multi-file Sweave projects?
  2. What are the most common errors that you encounter in resulting PDFs?
  3. How do you efficiently identify errors in the resulting PDF?
  4. How do you efficiently move between PDF and Rnw source?
like image 442
Jeromy Anglim Avatar asked Mar 19 '11 06:03

Jeromy Anglim


3 Answers

I'm not sure if this is what you are looking for, but most of these problems can be made less of an issue if you use emacs, auctex and emacs speaks statistics. They are all available in linux repositories, and there is a precompiled binary available for Windows http://vgoulet.act.ulaval.ca/en/emacs/windows/

The major advantage of Emacs is that you can have your R console in one window, your tex source in another, and Emacs will highlight both LaTeX and R appropriately in an .Rnw file, which is something that really helped me to spot mistakes. You can also evaluate small regions of the R code, and preview tables and other objects in TeX. Its definitely a learning curve, but i have been using it for about a month and it has already made me about 50% more productive in my reproducible research. The keybindings are intuitive once you know some, and another advantage is that Emacs provides modes for almost every programming lanaguage under the sun, which means that the time spent in learning how to use it will repay itself over and over. More specifically 1) Emacs helps here with syntax highlighting and preview regions to ensure that particular tables are formatted as you want, with no missing rows or labels. 2) I normally end up making spelling mistakes and package missing errors as i tend to develop my statistical analyses in a number of passes over the document. 3) Emacs will spot any compilation errors, and the R code can all be tested individually before the whole document is compiled. 4) If you use the command to sweave (Alt+m, s), then compile to LateX ctrl c, (normally twice to get labels and Bibtex right) another ctrl c will open up the PDF for viewing (sadly, it doesnt open in emacs by default, but i imagine that there is a package or script that someone has done to enable this).

I'm sure others can give more examples of the usefulness of emacs for this kind of work, as i said, i am just beginning with it (but its far better than all the other tex and R programs i have used - Technix center, kile, texmaker).

I wouldn't recommend it for someone who didnt know both R and LaTeX, but if you do, it makes you orders of magnitude more efficient.

like image 155
richiemorrisroe Avatar answered Nov 20 '22 18:11

richiemorrisroe


Good question. The problems a person sees depends heavily on the work (s)he's doing. For me, the most common non-R problems are misspellings, figures out of whack, an equation with a mistake in it, and so forth.

The most reliable, platform-independent, and efficient error-catching strategy I have found is to export to PDF frequently. Work a little bit; check. Work a little bit more, check again. Yes, this sucks for a large project. Tools like cacheSweave can help, though. The bottom line - if you work for 2 hours all over the place and get an error, it's no fun trying to track it down.

With a large project, when I get an error in chunk 287 (or something) it helps to take a moment and tangle the R code. From context I can usually figure out where the error is and navigate there quickly. Another option would be to name the code chunks, but who wants to come up with 591 names?

For the equation/math problems an editor with in-line preview is helpful. LyX has this, and AUCTeX does too. That way, if you miss a slash or comma somewhere then you know instantly because the preview is messed up. This has saved me countless hours.

The inline-preview of images (generated by Sweave) doesn't exist for LyX, but it does for Org-mode. This is a very, very strong plus for the same reason.

I don't really have any other LaTeX errors these days because LyX is WYSIWYM; it generates the LaTeX without me. Org-mode is good in this regard, as well. AUCTeX and ESS have tools to help and are OK (Rstudio looks similar). I haven't played with Eclipse et al. very much.

Some problems are really hard to notice without studying the logs, like a URL (or table, etc.) running off the page. PDF frequently. Work and check. It's the best way, barring peer review by another set of eyes.

By the way, LyX spell-checks the non-LaTeX markup with aspell.

like image 28
G. Jay Kerns Avatar answered Nov 20 '22 19:11

G. Jay Kerns


I'm not sure exactly what you're looking for when you mean "proofreading," but I find that in LaTeX in general using lots of \marginpar statements to note any problems for future fixing works well. The other way to do it would be to put notes in the final PDF using a good PDF reader, but then they go away if you re-compile.

For those of us who have permanent hand troubles from using Emacs (not kidding!), the GUI-based option for Sweave is Eclipse. It can be set up for one-click compiling of Sweave, does proper code highlighting, and has the usual IDE features. Eclipse also offers spell check via a package, which helps with proofreading. Not sure if you can set the spell checker to only proof the LaTeX portions, which would be the ideal.

RStudio is a new but interesting option as well.

like image 1
Ari B. Friedman Avatar answered Nov 20 '22 18:11

Ari B. Friedman