R packages are a collection of R functions, complied code and sample data. They are stored under a directory called "library" in the R environment. By default, R installs a set of packages during installation. More packages are added later, when they are needed for some specific purpose.
RMySQL, RPostgresSQL, RSQLite - If you'd like to read in data from a database, these packages are a good place to start. Choose the package that fits your type of database. XLConnect, xlsx - These packages help you read and write Micorsoft Excel files from R. You can also just export your spreadsheets from Excel as .
Packages in R Programming language are a set of R functions, compiled code, and sample data. These are stored under a directory called “library” within the R environment. By default, R installs a group of packages during installation. Once we start the R console, only the default packages are available by default.
I have written way too many packages, so to keep things manageable I've invested a lot of time in infrastructure packages: packages that help me make my code more robust and help make it easier for others to use. These include:
roxygen2
(with Manuel Eugster and Peter Danenberg), which allows you to keep documentation next to the function it documents, which it makes it much more likely that I'll keep it up to date. roxygen2
also has a number of new features designed to minimise documentation duplication: templates (@template
), parameter inheritance (@inheritParams
), and function families (@family
) to name a few.
testthat
automates the testing of my code. This is becoming more and more important as I have less and less time to code: automated tests remember how the function should work, even when I don't.
devtools
automates many common development tasks (as Andrie mentioned). The eventual goal for devtools
is for it to act like R CMD check
that runs continuously in the background and notifies you the instance that something goes wrong.
profr
, particularly the unreleased interactive explorer, makes it easy for me to find bottlenecks in my code.
helpr
(with Barret Schloerke), which will soon power http://had.co.nz/ggplot2, provides an elegant html interface to R documentation.
Useful R functions:
apropos
: I'm always forgetting the names of useful functions, and apropos
helps me find them, even if I only remember a fragmentOutside of R:
I use textmate to edit R (and other) files, but I don't think it's really that important. Pick one and learn all it's nooks and crannies.
Spend some time to learn the command line. Anything you can do to automate any part of your workflow will pay off in the long run. Running R from the command line leads to a natural process where each project has it's own instance of R; I often have 2-5 instances of R running at a time.
Use version control. I like git
and github. Again, it doesn't matter exactly which system you use, but master it!
Things I wish R had:
As I recall this has been asked before and my answer remains the same: Emacs.
Emacs can
M-x shell
and/or M-x eshell
, has nice directory access functionality with dired mode, has ssh mode for remote access<tongueInCheek>
is not Eclipse and does not require Java</tongueInCheek>
You can of course combine it with whichever CRAN packages you like: RUnit or testthat, the different profiling support packages, the debug package, ...
Additional tools that are useful:
R CMD check
really is your friend as this is what CRAN uses to decide whether you are "in or out"; use it and trust ittests/
directory can offer a simplified version of unit tests by saving to-be-compared against output (from a prior R CMD check
run), this is useful but proper unit tests are betterr -lfoo -e'bar(1, "ab")'
starts an R session, loads the foo
package and evaluates the given expression (here a function bar()
with two arguments). This, combined with R CMD INSTALL
, provides a full test cycle.Knowledge of, and ability to use, the basic R debugging tools is an essential first step in learning to quickly debug R code. If you know how to use the basic tools you can debug code anywhere without having to need all the extra tools provided in add-on packages.
traceback()
allows you to see the call stack leading to an error
foo <- function(x) {
d <- bar(x)
x[1]
}
bar <- function(x) {
stopifnot(is.matrix(x))
dim(x)
}
foo(1:10)
traceback()
yields:
> foo(1:10)
Error: is.matrix(x) is not TRUE
> traceback()
4: stop(paste(ch, " is not ", if (length(r) > 1L) "all ", "TRUE",
sep = ""), call. = FALSE)
3: stopifnot(is.matrix(x))
2: bar(x)
1: foo(1:10)
So we can clearly see that the error happened in function bar()
; we've narrowed down the scope of bug hunt. But what if the code generates warnings, not errors? That can be handled by turning warnings into errors via the warn
option:
options(warn = 2)
will turn warnings into errors. You can then use traceback()
to track them down.
Linked to this is getting R to recover from an error in the code so you can debug what went wrong. options(error = recover)
will drop us into a debugger frame whenever an error is raised:
> options(error = recover)
> foo(1:10)
Error: is.matrix(x) is not TRUE
Enter a frame number, or 0 to exit
1: foo(1:10)
2: bar(x)
3: stopifnot(is.matrix(x))
Selection: 2
Called from: bar(x)
Browse[1]> x
[1] 1 2 3 4 5 6 7 8 9 10
Browse[1]> is.matrix(x)
[1] FALSE
You see we can drop into each frame on the call stack and see how the functions were called, what the arguments are etc. In the above example, we see that bar()
was passed a vector not a matrix, hence the error. options(error = NULL)
resets this behaviour to normal.
Another key function is trace()
, which allows you to insert debugging calls into an existing function. The benefit of this is that you can tell R to debug from a particular line in the source:
> x <- 1:10; y <- rnorm(10)
> trace(lm, tracer = browser, at = 10) ## debug from line 10 of the source
Tracing function "lm" in package "stats"
[1] "lm"
> lm(y ~ x)
Tracing lm(y ~ x) step 10
Called from: eval(expr, envir, enclos)
Browse[1]> n ## must press n <return> to get the next line step
debug: mf <- eval(mf, parent.frame())
Browse[2]>
debug: if (method == "model.frame") return(mf) else if (method != "qr") warning(gettextf("method = '%s' is not supported. Using 'qr'",
method), domain = NA)
Browse[2]>
debug: if (method != "qr") warning(gettextf("method = '%s' is not supported. Using 'qr'",
method), domain = NA)
Browse[2]>
debug: NULL
Browse[2]> Q
> untrace(lm)
Untracing function "lm" in package "stats"
This allows you to insert the debugging calls at the right point in the code without having to step through the proceeding functions calls.
If you want to step through a function as it is executing, then debug(foo)
will turn on the debugger for function foo()
, whilst undebug(foo)
will turn off the debugger.
A key point about these options is that I haven't needed to modify/edit any source code to insert debugging calls etc. I can try things out and see what the problem is directly from the session where there error has occurred.
For a different take on debugging in R, see Mark Bravington's debug package on CRAN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With