Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where can I learn how to write C code to speed up slow R functions? [closed]

Tags:

r

rcpp

People also ask

Why is my R code taking so long?

There is a lot of overhead in the processing because R needs to check the type of a variable nearly every time it looks at it. This makes it easy to change types and reuse variable names, but slows down computation for very repetitive tasks, like performing an action in a loop.

Why is C faster than R?

In this case, the speed advantage was because Rcpp was modifying the original object rather than modifying a copied object. When C(++) code performed the same fundamental operations as the R code, it was similarly fast, but C allowed more control over which operations were performed.


Well there is the good old Use the source, Luke! --- R itself has plenty of (very efficient) C code one can study, and CRAN has hundreds of packages, some from authors you trust. That provides real, tested examples to study and adapt.

But as Josh suspected, I lean more towards C++ and hence Rcpp. It also has plenty of examples.

Edit: There were two books I found helpful:

  • The first one is Venables and Ripley's "S Programming" even though it is getting long in the tooth (and there have been rumours of a 2nd edition for years). At the time there was simply nothing else.
  • The second in Chambers' "Software for Data Analysis" which is much more recent and has a much nicer R-centric feel -- and two chapters on extending R. Both C and C++ get mentioned. Plus, John shreds me for what I did with digest so that alone is worth the price of admission.

That said, John is growing fond of Rcpp (and contributing) as he finds the match between R objects and C++ objects (via Rcpp) to be very natural -- and ReferenceClasses help there.

Edit 2: With Hadley's refocussed question, I very strongly urge you to consider C++. There is so much boilerplate nonsense you have to do with C---very tedious and very avoidable. Have a look at the Rcpp-introduction vignette. Another simple example is this blog post where I show that instead of worrying about 10% differences (in one of the Radford Neal examples) we can get eightyfold increases with C++ (on what is of course a contrived example).

Edit 3: There is complexity in that you may run into C++ errors that are, to put it mildly, hard to grok. But to just use Rcpp rather than to extend it, you should hardly ever need it. And while this cost is undeniable, it is far eclipsed by the benefit of simpler code, less boilerplate, no PROTECT/UNPROTECT, no memory management etc pp. Doug Bates just yesterday stated that he finds C++ and Rcpp to be much more like writing R than writing C++. YMMV and all that.


Hadley,

You can definitely write C++ code that is similar to C code.

I understand what you say about C++ being more complicated than C. This is if you want to master everything : objects, templates, STL, template meta programming, etc ... most people don't need these things and can just rely on others to it. The implementation of Rcpp is very complicated, but just because you don't know how your fridge works, it does not mean you cannot open the door and grab fresh milk ...

From your many contributions to R, what strikes me is that you find R somewhat tedious (data manipulation, graphics, string manipulatio, etc ...). Well get prepared for many more surprises with the internal C API of R. This is very tedious.

From time to time, I read the R-exts or R-ints manuals. This helps. But most of the time, when I really want to find out about something, I go into the R source, and also in the source of packages written by e.g. Simon (there is usually lots to learn there).

Rcpp is designed to make these tedious aspects of the API go away.

You can judge for yourself what you find more complicated, obfuscated, etc ... based on a few examples. This function creates a character vector using the C API:

SEXP foobar(){
  SEXP ab;
  PROTECT(ab = allocVector(STRSXP, 2));
  SET_STRING_ELT( ab, 0, mkChar("foo") );
  SET_STRING_ELT( ab, 1, mkChar("bar") );
  UNPROTECT(1);
}

Using Rcpp, you can write the same function as:

SEXP foobar(){
   return Rcpp::CharacterVector::create( "foo", "bar" ) ;
}

or:

SEXP foobar(){
   Rcpp::CharacterVector res(2) ;
   res[0] = "foo" ;
   res[1] = "bar" ;
   return res ;
}

As Dirk said, there are other examples on the several vignettes. We also usually point people towards our unit tests because each of them test a very specific part of the code and are somewhat self explanatory.

I'm obviously biased here, but I would recommend getting familiar about Rcpp instead of learning the C API of R, and then come to the mailing list if something is unclear or does not seem doable with Rcpp.

Anyway, end of the sales pitch.

I guess it all depends what sort of code you want to write eventually.

Romain


@hadley: unfortunately, I don't have specific resources in mind to help you getting started on C++. I picked it up from Scott Meyers's books (Effective C++, More effective C++, etc ...) but these are not really what one could call introductory.

We almost exclusively use the .Call interface to call C++ code. The rule is easy enough :

  • The C++ function must return an R object. All R objects are SEXP.
  • The C++ function takes between 0 and 65 R objects as input (again SEXP)
  • it must (not really, but we can save this for later) be declared with C linkage, either with extern "C" or the RcppExport alias that Rcpp defines.

So a .Call function gets declared like this in some header file:

#include <Rcpp.h>

RcppExport SEXP foo( SEXP x1, SEXP x2 ) ;

and implemented like this in a .cpp file :

SEXP foo( SEXP x1, SEXP x2 ){
   ...
}

There is not much more to know about the R API to be using Rcpp.

Most people only want to deal with numeric vectors in Rcpp. You do this with the NumericVector class. There are several ways to create a numeric vector :

From an existing object that you pass down from R:

 SEXP foo( SEXP x_) {
    Rcpp::NumericVector x( x_ ) ;
    ...
 }

With given values using the ::create static function:

 Rcpp::NumericVector x = Rcpp::NumericVector::create( 1.0, 2.0, 3.0 ) ;
 Rcpp::NumericVector x = Rcpp::NumericVector::create( 
    _["a"] = 1.0, 
    _["b"] = 2.0, 
    _["c"] = 3
 ) ;

Of a given size:

 Rcpp::NumericVector x( 10 ) ;      // filled with 0.0
 Rcpp::NumericVector x( 10, 2.0 ) ; // filled with 2.0

Then once you have a vector, the most useful thing is to extract one element from it. This is done with the operator[], with 0-based indexing, so for example summing values of a numeric vector goes something like this:

SEXP sum( SEXP x_ ){
   Rcpp::NumericVector x(x_) ;
   double res = 0.0 ;
   for( int i=0; i<x.size(), i++){
      res += x[i] ;
   }
   return Rcpp::wrap( res ) ;
}

But with Rcpp sugar we can do this much more nicely now:

using namespace Rcpp ;
SEXP sum( SEXP x_ ){
   NumericVector x(x_) ;
   double res = sum( x ) ;
   return wrap( res ) ;
}

As I said before, it all depends on what sort of code you want to write. Look into what people do in packages that rely on Rcpp, check the vignettes, the unit tests, come back to us on the mailing list. We are always happy to help.


@jbremnant: That's right. Rcpp classes implement something close to the RAII pattern. When an Rcpp object is created, the constructor takes appropriate measures to ensure the underlying R object (SEXP) is protected from the garbage collector. The destructor withdraws the protection. This is explained in the Rcpp-intrduction vignette. The underlying implementation relies on the R API functions R_PreserveObject and R_ReleaseObject

There is indeed performance penalty due to C++ encapsulation. We try to keep this at a minimum with inlining, etc ... The penalty is small, and when you take into account the gain in terms of time it takes to write and maintain code, it is not that relevant.

Calling R functions from the Rcpp class Function is slower than directly calling eval with the C api. This is because we take precautions and wrap the function call into a tryCatch block so that we capture R errors and promote them to C++ exceptions so that they can be dealt with using the standard try/catch in C++.

Most people want to use vectors (specially NumericVector), and the penalty is very small with this class. The examples/ConvolveBenchmarks directory contains several variants of the notorious convolution function from R-exts and the vignette has benchmark results. It turns out that Rcpp makes it faster than the benchmark code that uses the R API.