Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are built-in functions in R usually optimized?

I have some written code to compute the correlation coefficient in R. However, I just found out that the 'boot' package offers a corr() functions which does the same job. Are built-in functions in R usually more efficient and faster than the equivalent ones we write from scratch?

Thank you.

like image 385
Concerned_Citizen Avatar asked Jul 12 '11 16:07

Concerned_Citizen


People also ask

What is built in function in R?

The functions which are already created or defined in the programming framework are known as a built-in function. R has a rich set of functions that can be used to perform almost every task for the user. These built-in functions are divided into the following categories based on their functionality.

What makes R slow?

Beyond performance limitations due to design and implementation, it has to be said that a lot of R code is slow simply because it's poorly written. Few R users have any formal training in programming or software development. Fewer still write R code for a living.

How fast is R language?

The total duration of the R Script is approximately 11 minutes and 12 seconds, being roughly 7.12 seconds per loop. The total duration of the Python Script is approximately 2 minutes and 2 seconds, being roughly 1.22 seconds per loop. The Python code is 5.8 times faster than the R alternative!


2 Answers

I don't think there is a single specific answer to this question as it will vary wildly depending on the specific function you are asking about. Some functions in contributed packages are added as a convenience and are simply wrappers around base functions. Others are added to extend the base functionality or to address some other perceived deficit in the base functions. Some as you suggest are added to improve computation time or to become more efficient. And others are added because the authors of the contributing packages feel that the solutions in base R are simply wrong in some way.

In the case of stats:::cor and boot:::corr, it looks like the latter adds a weighting capability. It does not necessarily appear to be any faster:

> dat <- matrix(rnorm(1e6), ncol = 2)
> system.time(
+ cor(dat[, 1],dat[, 2])
+ )
   user  system elapsed 
   0.01    0.00    0.02 
> system.time(
+ corr(dat)
+ )
   user  system elapsed 
   0.11    0.00    0.11 
like image 105
Chase Avatar answered Sep 27 '22 23:09

Chase


This more-less (i.e. not counting crappy code) boils down to a question whether certain procedure is implemented in R or as a C(++) or Fortran code -- if the function contains a call to .Internal, .External, .C, .Fortran or .Call it means this is this second case and probably it will work faster. Note that this is orthogonal to the question weather the function is from base R or a package.

However, you must always remember that efficiency is a relative thing and must be always perceived in context of the whole task and weighted with the programmer's effort necessary to speed something up. It is an equal nonsense to reduce execution time from 1s to 10ms, rewrte everything to use base just because packages are evil or invest few hours in optimizing function A while 90% of actual execution time hides in function B.

like image 31
mbq Avatar answered Sep 28 '22 00:09

mbq