OK, we're all familiar with double colon operator in R. Whenever I'm about to write some function, I use require(<pkgname>)
, but I was always thinking about using ::
instead. Using require
in custom functions is better practice than library
, since require
returns warning and FALSE
, unlike library
, which returns error if you provide a name of non-existent package.
On the other hand, ::
operator gets the variable from the package, while require
loads whole package (at least I hope so), so speed differences came first to my mind. ::
must be faster than require
.
And I did some analysis in order to check that - I've written two simple functions that load read.systat
function from foreign
package, with require
and ::
respectively, hence import Iris.syd
dataset that ships with foreign
package, replicated functions 1000 times each (which was shamelessly arbitrary), and... crunched some numbers.
Strangely (or not) I found significant differences in terms of user CPU and elapsed time, while there were no significant differences in terms of system CPU. And yet more strange conclusion: ::
is actually slower! Documentation for ::
is very blunt, and just by looking at sources it's obvious that ::
should perform better!
require
#!/usr/local/bin/r
## with require
fn1 <- function() {
require(foreign)
read.systat("Iris.syd", to.data.frame=TRUE)
}
## times
n <- 1e3
sink("require.txt")
print(t(replicate(n, system.time(fn1()))))
sink()
double colon
#!/usr/local/bin/r
## with ::
fn2 <- function() {
foreign::read.systat("Iris.syd", to.data.frame=TRUE)
}
## times
n <- 1e3
sink("double_colon.txt")
print(t(replicate(n, system.time(fn2()))))
sink()
Grab CSV data here. Some stats:
user CPU: W = 475366 p-value = 0.04738 MRr = 975.866 MRc = 1025.134
system CPU: W = 503312.5 p-value = 0.7305 MRr = 1003.8125 MRc = 997.1875
elapsed time: W = 403299.5 p-value < 2.2e-16 MRr = 903.7995 MRc = 1097.2005
MRr is mean rank for require
, MRc ibid for ::
. I must have done something wrong here. It just doesn't make any sense... Execution time for ::
seems way faster!!! I may have screwed something up, you shouldn't discard that option...
OK... I've wasted my time in order to see that there is some difference, and I carried out completely useless analysis, so, back to the question:
"Why should one prefer require
over ::
when writing a function?"
=)
The double-colon operator :: selects definitions from a particular namespace. In the example above, the transpose function will always be available as base::t , because it is defined in the base package. Only functions that are exported from the package can be retrieved in this way.
"Why should one prefer require over :: when writing a function?"
I usually prefer require
due to the nice TRUE/FALSE return value that lets me deal with the possibility of the package not being available up front before getting into the code. Crash as early as possible instead of halfway through your analysis.
I only use ::
when I need to make sure I am using the correct version of a function, not a version from some other package that is masking the name.
On the other hand, :: operator gets the variable from the package, while require loads whole package (at least I hope so), so speed differences came first to my mind. :: must be faster than require.
I think you may be ignoring the effects of lazy loading which is used by the foreign
package according to the first page of its manual. Essentially, packages that use lazy loading defer the loading of objects, such as functions, until the objects are called upon for the first time. So your argument that "::
must be faster than require" is not necessarily true as foreign
is not loading all of its contents into memory when you attach it with require
. For full details on lazy loading, see Prof. Ripley's article in RNews, Volume 4, Issue 2.
Since the time to load a package is almost always small compared to the time you spend trying to figure out what the code you wrote six months ago was about, in this case coding for clarity is the most important thing.
For scripts, having a call to require
or library
at the start lets you know which packages you need straight away.
Similarly, calling require
(or a wrapper like requirePackage
in Hmisc
or try_require
in ggplot2
) at the start of a function is the most unambiguous way of showing that you need to use that package.
::
should be reserved for cases when you have naming conflicts between packages – compare, e.g.,
Hmisc::is.discrete
and
plyr::is.discrete
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With