Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rcpp proxy model and R memory allocation

Tags:

r

rcpp

I am trying to get a better understanding of how the Rcpp proxy model works.

For this, consider the following task: sample exponential random variables and do something with the result. A naive Rcpp implementation could be

NumericMatrix rmexp1(int n, int d) {
  NumericMatrix out(n, d);
  NumericVector values;
  for (int k=0; k<n; k++) {
    values = Rcpp::rexp(d);
    // do something with values 
    out(k, _) = values;
  }
  return out;
}

Are the following statements correct?

  • In each iteration, in l#5, Rcpp::rexp allocates space for a new R vector, then values stores the reference to that and discards the reference it previously held.
  • In l#7, the values in values are hard-copied into out(k, _) since left- and right-hand-side datatypes are different.
  • If this is the case, lots of memory is allocated for objects in R without any real need for that. Should that be avoided if speed is an issue?
like image 970
hsloot Avatar asked Apr 23 '20 14:04

hsloot


1 Answers

Let's approach this experimentally. How much memory is allocated by R and how long does that take? First, let's use your function and run it with different arguments. I am wrapping this in bench::mark, since this gives me both RAM and CPU measurements:

> bench::mark(rmexp1(100, 10),
+             rmexp1(100, 100),
+             rmexp1(100, 1000),
+             rmexp1(100, 10000),
+             check = FALSE)
#> # A tibble: 4 x 13
#>   expression              min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#> 1 rmexp1(100, 10)     46.93µs  52.61µs   16307.    10.35KB     8.24  7918     4
#> 2 rmexp1(100, 100)   381.41µs 538.42µs    1786.      3.9MB     4.14   863     2
#> 3 rmexp1(100, 1000)    4.83ms   5.08ms     187.     1.53MB     8.68    86     4
#> 4 rmexp1(100, 10000)  59.85ms  63.19ms      15.5   15.27MB     5.17     6     2
#> # … with 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
#> #   time <list>, gc <list>

Unsurprisingly, a larger matrix takes longer and requires more memory. In addition, the allocated memory is about twice as large as the memory required for the output matrix. So yes, we are allocating more memory than is needed here.

Is that performance critical? It depends. After all, you are creating random variates with an exponential distribution, which takes a finite time. In addition, you are doing some unspecified computation in do something with values, which might take even longer. Let's get rid of creating random variates by using alternative functions which only allocate memory with or without initializing it to zero:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericMatrix rmzero(int n, int d) {
    NumericMatrix out(n, d);
    NumericVector values;
    for (int k=0; k<n; k++) {
        values = Rcpp::NumericVector(d);
        // do something with values 
        out(k, _) = values;
    }
    return out;
}

// [[Rcpp::export]]
NumericMatrix rmnoinit(int n, int d) {
    NumericMatrix out(n, d);
    NumericVector values;
    for (int k=0; k<n; k++) {
        values = Rcpp::NumericVector(Rcpp::no_init(d));
        // do something with values 
        out(k, _) = values;
    }
    return out;
}

With bench::mark we get:

> bench::mark(rmexp1(100, 1000),
+             rmzero(100, 1000),
+             rmnoinit(100, 1000),
+             check = FALSE)
#> # A tibble: 3 x 13
#>   expression               min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#> 1 rmexp1(100, 1000)     4.83ms   5.05ms      190.    1.53MB     8.72    87     4
#> 2 rmzero(100, 1000)   509.74µs 562.24µs     1510.    1.53MB    60.4    525    21
#> 3 rmnoinit(100, 1000) 404.24µs 469.43µs     1785.    1.53MB    53.8    664    20
#> # … with 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
#> #   time <list>, gc <list>

So roughly only 1/10 of the execution time of your function is due to memory allocation and other overhead. The rest comes from the random variates.

If generating random variates is the actual bottleneck in your code, you might be interested in my dqrng package:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::depends(dqrng)]]
#include <dqrng.h>
// [[Rcpp::export]]
NumericMatrix rmdqexp1(int n, int d) {
    NumericMatrix out(n, d);
    NumericVector values;
    for (int k=0; k<n; k++) {
        values = dqrng::dqrexp(d);
        // do something with values 
        out(k, _) = values;
    }
    return out;
}

With bench::mark we get:

> bench::mark(rmexp1(100, 1000),
+             rmdqexp1(100, 1000),
+             check = FALSE)
#> # A tibble: 2 x 13
#>   expression             min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>   <bch:expr>          <bch:> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#> 1 rmexp1(100, 1000)   3.69ms 5.03ms      201.    1.53MB     6.36    95     3
#> 2 rmdqexp1(100, 1000) 1.09ms 1.21ms      700.    1.65MB    22.6    310    10
#> # … with 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
#> #   time <list>, gc <list>

Quite a bit of time can be saved by using a faster random number generator.

like image 66
Ralf Stubner Avatar answered Oct 03 '22 19:10

Ralf Stubner