Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reproducing R rep with the times argument in C++ and Rcpp

Tags:

r

rcpp

I'm learning to use Rcpp. I'd like to use C++ to replicate the rep function in R. Rcpp includes several sugar functions that correspond to rep in R. (see bottom of page 3 at: http://cran.r-project.org/web/packages/Rcpp/vignettes/Rcpp-quickref.pdf. As I understand the documentation, the sugar functions rep, rep_each, and rep_len take two arguments -- a vector and an integer. However, what I would like to do is replicate the functionality of rep in R when I use the times argument. In that case, you can supply two vectors. A quick example in R:

x <- c(10, 5, 12)
y <- c(2, 6, 3)

rep(x, times = y)
[1] 10 10  5  5  5  5  5  5 12 12 12

Thus rep with the times argument replicates each element of x as many times as the corresponding y value. As I understand it, I can't see any way to use the Rcpp sugar functions for this.

I have created the following C++ function that works:

// [[Rcpp::export]]
NumericVector reptest(NumericVector x, NumericVector y) {
    int n = y.size();
    NumericVector myvector(sum(y));
    int ind = 0;
    for (int i = 0; i < n; ++i) {
        for (int j = 0; j < y(i); ++j) {
            myvector(ind) = x[i];
            ind = ind + 1;
            }
        }
    return myvector;
}

x <- c(10, 5, 12)
y <- c(2, 6, 3)

reptest(x, y)
[1] 10 10  5  5  5  5  5  5 12 12 12

It is a bit slower than rep in R. I am wondering if there is anyway to speed this up or if anyone has a better idea. As I understand it, rep is calling C code, so maybe it will be near impossible to improve upon rep. My goal is to speed up an MCMC loop (that uses the rep function) that takes a lot of time to run in R, so any speedup would be useful. Other parts of the MCMC loop are the slow parts, not rep, but I need the same functionality in my loop.

like image 768
Scott Baldwin Avatar asked Feb 10 '15 21:02

Scott Baldwin


2 Answers

Here is a quick riff on the two intial versions. It also adds rep.int():

#include <algorithm>
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector reptest(NumericVector x, NumericVector y) {
    int n = y.size();
    NumericVector myvector(sum(y));
    int ind = 0;
    for (int i = 0; i < n; ++i) {
        for (int j = 0; j < y[i]; ++j) {
            myvector[ind] = x[i];
            ind = ind + 1;
            }
        }
    return myvector;
}

// [[Rcpp::export]]
NumericVector reptest2(NumericVector x, NumericVector y) {
    int n = y.size();
    NumericVector myvector(sum(y));
    int ind=0;
    for (int i=0; i < n; ++i) {
        int p = y[i];
        std::fill(myvector.begin()+ind, myvector.begin()+ind+p, x[i]);
        ind += p;
    }
    return myvector;
}


/*** R
x <- rep(c(10, 5, 12), 10000)
y <- rep(c(20, 60, 30), 10000)
all.equal(reptest(x, y), reptest2(x, y), rep(x, times=y))

library(microbenchmark)
microbenchmark(reptest(x, y), reptest2(x, y), rep(x, times=y), rep.int(x, y))
***/

With this, we get a little closer but R still wins:

R> Rcpp::sourceCpp("/tmp/rep.cpp")

R> x <- rep(c(10, 5, 12), 10000)

R> y <- rep(c(20, 60, 30), 10000)

R> all.equal(reptest(x, y), reptest2(x, y), rep(x, times=y))
[1] TRUE

R> library(microbenchmark)

R> microbenchmark(reptest(x, y), reptest2(x, y), rep(x, times=y), rep.int(x, y))
Unit: milliseconds
              expr     min      lq    mean  median      uq       max neval
     reptest(x, y) 4.61604 4.74203 5.47543 4.78120 6.78039   7.01879   100
    reptest2(x, y) 3.14788 3.27507 5.25515 3.33166 5.24583 140.64080   100
 rep(x, times = y) 2.45876 2.56025 3.26857 2.60669 4.60116   6.76278   100
     rep.int(x, y) 2.42390 2.50241 3.38362 2.53987 4.56338   6.44241   100
R> 
like image 142
Dirk Eddelbuettel Avatar answered Oct 12 '22 17:10

Dirk Eddelbuettel


One way to speed it up would be to use std::fill instead of iterating through each element to be filled:

#include <algorithm>
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector reptest2(NumericVector x, NumericVector y) {
  int n = y.size();
  std::vector<double> myvector(sum(y));
  int ind=0;
  for (int i=0; i < n; ++i) {
    std::fill(myvector.begin()+ind, myvector.begin()+ind+y[i], x[i]);
    ind += y[i];
  }
  return Rcpp::wrap(myvector);
}

On a larger example, this appears to get closer to rep:

x <- rep(c(10, 5, 12), 10000)
y <- rep(c(20, 60, 30), 10000)
all.equal(reptest(x, y), reptest2(x, y), rep(x, times=y))
# [1] TRUE

library(microbenchmark)
microbenchmark(reptest(x, y), reptest2(x, y), rep(x, times=y))
# Unit: milliseconds
#               expr      min       lq      mean   median        uq      max neval
#      reptest(x, y) 9.072083 9.297573 11.469345 9.522182 13.015692 20.47905   100
#     reptest2(x, y) 5.097358 5.270827  7.367577 5.436549  8.961004 15.68812   100
#  rep(x, times = y) 1.457933 1.499051  2.884887 1.561408  1.949750 13.21706   100
like image 37
josliber Avatar answered Oct 12 '22 16:10

josliber