Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Return a dynamic vector from C to R

I am writing some code in C to be called dynamically from R.

This code generates a path of a random Poisson process up to a desired time T. So at every call to my C function, the length of the returned vector will be different depending on the random numbers generated.

What R data structure will I have to create? a LISTSXP? another one?

How can I create it, and how can I append to it? And especially how can I return it back to R?

Thanks for help.

like image 608
Alberto Contador Avatar asked Jan 09 '12 23:01

Alberto Contador


2 Answers

If you're open to switching from C to C++, you get added layer of Rcpp for free. Here is an example from the page of the (still rather incomple) RcppExample package:

#include <RcppClassic.h>
#include <cmath>

RcppExport SEXP newRcppVectorExample(SEXP vector) {
BEGIN_RCPP

    Rcpp::NumericVector orig(vector);                // keep a copy 
    Rcpp::NumericVector vec(orig.size());            // create vector same size

    // we could query size via
    //   int n = vec.size();
    // and loop over the vector, but using the STL is so much nicer
    // so we use a STL transform() algorithm on each element
    std::transform(orig.begin(), orig.end(), vec.begin(), ::sqrt);

    return Rcpp::List::create(Rcpp::Named( "result" ) = vec,
                              Rcpp::Named( "original" ) = orig) ;

END_RCPP
}

As you see, no explicit memory allocation, freeing, PROTECT/UNPROTECT etc, and you get a first class R list object back.

There are lots more examples, included in other SO questions such as this one.

Edit: And you didn't really say what you paths would do, but as a simple illustration, here is C++ code using the Rcpp additions cumsum() and rpois() which behave just like they do in R:

R> library(inline)
R> 
R> fun <- cxxfunction(signature(ns="integer", lambdas="numeric"),
+                    plugin="Rcpp",
+                    body='
+    int n = Rcpp::as<int>(ns);
+    double lambda = Rcpp::as<double>(lambdas);
+ 
+    Rcpp::RNGScope tmp;                  // make sure RNG behaves
+ 
+    Rcpp::NumericVector vec = cumsum( rpois( n, lambda ) );
+ 
+    return vec;
+ ')
R> set.seed(42)
R> fun(3, 0.3)
[1] 1 2 2
R> fun(4, 0.4)
[1] 1 1 1 2

And as a proof, back in R, if we set the seed, we can generate exactly the same numbers:

R> set.seed(42)
R> cumsum(rpois(3, 0.3))
[1] 1 2 2
R> cumsum(rpois(4, 0.4))
[1] 1 1 1 2
R> 
like image 149
Dirk Eddelbuettel Avatar answered Nov 18 '22 20:11

Dirk Eddelbuettel


It's really up to you what you want to use as temporary structure, because in the end you'll have to allocate a vector for result anyway. So whatever you'll be using is not what you'll return. There are several possibilities:

  1. use Calloc + Realloc + Free which allows you to expand the temporary memory as needed. Once you have the full set, you allocate the result vector and return it.
  2. if you can overestimate the size easily, you can over-allocate the result vector and use SETLENGTH before you return it. There are issues with this, though, because the result will remain over-allocated until duplicated later on.
  3. You can use what you hinted at which is a chained list of vector blocks, i.e. allocate and protect one pairlist of which you append vectors to the tail as you need them. At the end you allocate the return vector and copy content of the blocks you allocated. This is more convoluted than both of the above.

Each of them has drawbacks and benefits, so it really depends on your application to pick the one best suited for you.

Edit: added an example of using the pairlists approach. I would still recommend the Realloc approach since it's much easier, but nonetheless:

#define BLOCK_SIZE xxx /* some reasonable size for increments - could be adaptive, too */ 
SEXP block; /* last vector block */
SEXP root = PROTECT(list1(block = allocVector(REALSXP, BLOCK_SIZE)));
SEXP tail = root;
double *values = REAL(block);
int count = 0, total = 0;
do { /* your code to generate values - if you want to add one 
        first try to add it to the existing block, otherwise allocate new one */
    if (count == BLOCK_SIZE) { /* add a new block when needed */
        tail = SETCDR(tail, list1(block = allocVector(REALSXP, BLOCK_SIZE)));
        values = REAL(block);
        total += count;
        count = 0;
    }
    values[count++] = next_value;
} while (...);
total += count;
/* when done, we need to create the result vector */
{
    SEXP res = allocVector(REALSXP, total);
    double *res_values = REAL(res);
    while (root != R_NilValue) {
        int size = (CDR(root) == R_NilValue) ? count : BLOCK_SIZE;
        memcpy(res_values, REAL(CAR(root)), sizeof(double) * size);
        res_values += size;
        root = CDR(root);
    }
    UNPROTECT(1);
    return res;
 }
like image 6
Simon Urbanek Avatar answered Nov 18 '22 20:11

Simon Urbanek