Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R writing style - require vs. ::

Tags:

OK, we're all familiar with double colon operator in R. Whenever I'm about to write some function, I use require(<pkgname>), but I was always thinking about using :: instead. Using require in custom functions is better practice than library, since require returns warning and FALSE, unlike library, which returns error if you provide a name of non-existent package.

On the other hand, :: operator gets the variable from the package, while require loads whole package (at least I hope so), so speed differences came first to my mind. :: must be faster than require.

And I did some analysis in order to check that - I've written two simple functions that load read.systat function from foreign package, with require and :: respectively, hence import Iris.syd dataset that ships with foreign package, replicated functions 1000 times each (which was shamelessly arbitrary), and... crunched some numbers.

Strangely (or not) I found significant differences in terms of user CPU and elapsed time, while there were no significant differences in terms of system CPU. And yet more strange conclusion: :: is actually slower! Documentation for :: is very blunt, and just by looking at sources it's obvious that :: should perform better!

require

#!/usr/local/bin/r

## with require
fn1 <- function() {
  require(foreign)
  read.systat("Iris.syd", to.data.frame=TRUE)
}

## times
n <- 1e3

sink("require.txt")
print(t(replicate(n, system.time(fn1()))))
sink()

double colon

#!/usr/local/bin/r

## with ::
fn2 <- function() {
  foreign::read.systat("Iris.syd", to.data.frame=TRUE)
}

## times
n <- 1e3


sink("double_colon.txt")
print(t(replicate(n, system.time(fn2()))))
sink()

Grab CSV data here. Some stats:

user CPU:     W = 475366    p-value = 0.04738  MRr =  975.866    MRc = 1025.134
system CPU:   W = 503312.5  p-value = 0.7305   MRr = 1003.8125   MRc =  997.1875
elapsed time: W = 403299.5  p-value < 2.2e-16  MRr =  903.7995   MRc = 1097.2005

MRr is mean rank for require, MRc ibid for ::. I must have done something wrong here. It just doesn't make any sense... Execution time for :: seems way faster!!! I may have screwed something up, you shouldn't discard that option...

OK... I've wasted my time in order to see that there is some difference, and I carried out completely useless analysis, so, back to the question:

"Why should one prefer require over :: when writing a function?"

=)

like image 935
aL3xa Avatar asked Dec 06 '10 23:12

aL3xa


People also ask

What are the two colons in R for?

The double-colon operator :: selects definitions from a particular namespace. In the example above, the transpose function will always be available as base::t , because it is defined in the base package. Only functions that are exported from the package can be retrieved in this way.


2 Answers

"Why should one prefer require over :: when writing a function?"

I usually prefer require due to the nice TRUE/FALSE return value that lets me deal with the possibility of the package not being available up front before getting into the code. Crash as early as possible instead of halfway through your analysis.

I only use :: when I need to make sure I am using the correct version of a function, not a version from some other package that is masking the name.

On the other hand, :: operator gets the variable from the package, while require loads whole package (at least I hope so), so speed differences came first to my mind. :: must be faster than require.

I think you may be ignoring the effects of lazy loading which is used by the foreign package according to the first page of its manual. Essentially, packages that use lazy loading defer the loading of objects, such as functions, until the objects are called upon for the first time. So your argument that ":: must be faster than require" is not necessarily true as foreign is not loading all of its contents into memory when you attach it with require. For full details on lazy loading, see Prof. Ripley's article in RNews, Volume 4, Issue 2.

like image 75
Sharpie Avatar answered Nov 03 '22 07:11

Sharpie


Since the time to load a package is almost always small compared to the time you spend trying to figure out what the code you wrote six months ago was about, in this case coding for clarity is the most important thing.

For scripts, having a call to require or library at the start lets you know which packages you need straight away.

Similarly, calling require (or a wrapper like requirePackage in Hmisc or try_require in ggplot2) at the start of a function is the most unambiguous way of showing that you need to use that package.

:: should be reserved for cases when you have naming conflicts between packages – compare, e.g.,

Hmisc::is.discrete

and

plyr::is.discrete
like image 36
Richie Cotton Avatar answered Nov 03 '22 06:11

Richie Cotton