Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

My C++ functions with Rcpp::List inputs are very slow

Tags:

r

rcpp

While C++ and specifically the Rcpp package have been tremendously helpful to me in speeding up my codes, I noticed that my C++ functions which have a list or data frame input argument(arguments of the form Rcpp::DataFrame and Rcpp::List) are very slower compared to my other C++ functions. I wrote a sample code and I wanted to ask for tricks that can make my code faster:

First, let's simulate a List in R that contains two Lists inside of it. Consider myList as a list that includes two lists - measure1 and measure2. measure1 and measure2 are lists themselves each include vectors of measurements for subjects. Here is the R code:

lappend <- function(lst, ...){
  lst <- c(lst, list(...))
return(lst)
}

nSub <- 30
meas1 <- list()
meas2 <- list()
for (i in 1:nSub){
  meas1 <- lappend(meas1, rnorm(10))
  meas2 <- lappend(meas2, rnorm(10))
}
myList <- list(meas1 = meas1, meas2 = meas2)

Now, suppose I want a C++ function that for each subject, finds summation of measure1 and summation of measure 2 and then creates two new measurements based on these two summation. Finally the function should return these new measurements as a list.

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::List mySlowListFn(Rcpp::List myList, int nSub){
   arma::vec myMult(nSub);
   arma::vec myDiv(nSub);
   for (int i = 0; i < nSub; i++){
     arma::vec meas1_i = Rcpp::as<arma::vec>(Rcpp::as<Rcpp::List>(myList["meas1"])[i]);
     arma::vec meas2_i = Rcpp::as<arma::vec>(Rcpp::as<Rcpp::List>(myList["meas2"])[i]);
     myMult[i] = arma::sum(meas1_i)*arma::sum(meas2_i);
     myDiv[i] = arma::sum(meas1_i)/arma::sum(meas2_i);
   }
   return Rcpp::List::create(Rcpp::Named("myMult") = myMult, 
                             Rcpp::Named("myDiv") = myDiv);
}

How can I make the function above faster? I'm particularly looking for ideas that keep the input and output lists in the code (since in my own program dealing with lists is inevitable), but with some tricks to reduce some overhead time. One thing that I thought of was:

 Rcpp::List mySlowListFn(const Rcpp::List& myList, int nSub)

Thanks very much for your help.

like image 873
Sam Avatar asked Apr 18 '14 23:04

Sam


1 Answers

First, note that copying semantics for lists have changed in recent versions of R (definitely in latest R-devel, not sure if it made it into R 3.1.0), whereby shallow copies of lists are made, and elements within are later copied if they are modified. There is a big chance that if you are running an older version of R, then its more expensive list copying semantics are getting in the way.

That said, here's how I would re-write your function for some extra speed, with a benchmark. sourceCpp it to compare on your own machine.

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::List mySlowListFn(Rcpp::List myList, int nSub){
   arma::vec myMult(nSub);
   arma::vec myDiv(nSub);
   for (int i = 0; i < nSub; i++){
     arma::vec meas1_i = Rcpp::as<arma::vec>(Rcpp::as<Rcpp::List>(myList["meas1"])[i]);
     arma::vec meas2_i = Rcpp::as<arma::vec>(Rcpp::as<Rcpp::List>(myList["meas2"])[i]);
     myMult[i] = arma::sum(meas1_i)*arma::sum(meas2_i);
     myDiv[i] = arma::sum(meas1_i)/arma::sum(meas2_i);
   }
   return Rcpp::List::create(Rcpp::Named("myMult") = myMult, 
                             Rcpp::Named("myDiv") = myDiv);
}

// [[Rcpp::export]]
Rcpp::List myFasterListFn(Rcpp::List myList, int nSub) {

  Rcpp::NumericVector myMult = Rcpp::no_init(nSub);
  Rcpp::NumericVector myDiv = Rcpp::no_init(nSub);

  Rcpp::List meas1 = myList["meas1"];
  Rcpp::List meas2 = myList["meas2"];

  for (int i = 0; i < nSub; i++) {

    arma::vec meas1_i( 
      REAL(VECTOR_ELT(meas1, i)), Rf_length(VECTOR_ELT(meas1, i)), false, true
    );

    arma::vec meas2_i(
      REAL(VECTOR_ELT(meas2, i)), Rf_length(VECTOR_ELT(meas2, i)), false, true
    );

    myMult[i] = arma::sum(meas1_i) * arma::sum(meas2_i);
    myDiv[i] = arma::sum(meas1_i) / arma::sum(meas2_i);
  }

  return Rcpp::List::create(
    Rcpp::Named("myMult") = myMult, 
    Rcpp::Named("myDiv") = myDiv
  );
}

/*** R
library(microbenchmark)
lappend <- function(lst, ...){
  lst <- c(lst, list(...))
  return(lst)
}

nSub <- 30
n <- 10
meas1 <- list()
meas2 <- list()
for (i in 1:nSub){
  meas1 <- lappend(meas1, rnorm(n))
  meas2 <- lappend(meas2, rnorm(n))
}
myList <- list(meas1 = meas1, meas2 = meas2)
x1 <- mySlowListFn(myList, nSub)
x2 <- myFasterListFn(myList, nSub)
microbenchmark(
  mySlowListFn(myList, nSub),
  myFasterListFn(myList, nSub)
)
*/

gives me

> library(microbenchmark)

> lappend <- function(lst, ...){
+   lst <- c(lst, list(...))
+   return(lst)
+ }

> nSub <- 30

> n <- 10

> meas1 <- list()

> meas2 <- list()

> for (i in 1:nSub){
+   meas1 <- lappend(meas1, rnorm(n))
+   meas2 <- lappend(meas2, rnorm(n))
+ }

> myList <- list(meas1 = meas1, meas2 = meas2)

> x1 <- mySlowListFn(myList, nSub)

> x2 <- myFasterListFn(myList, nSub)

> microbenchmark(
+   mySlowListFn(myList, nSub),
+   myFasterListFn(myList, nSub)
+ )
Unit: microseconds
                         expr    min      lq  median      uq    max neval
   mySlowListFn(myList, nSub) 14.772 15.4570 16.0715 16.7520 42.628   100
 myFasterListFn(myList, nSub)  4.502  5.0675  5.2470  5.8515 18.561   100

Future versions of Rcpp and Rcpp11 will have the ListOf<T> class which will make it much easier to interact with lists where we know the inner type beforehand, after the proper semantics have been ironed out.

like image 115
Kevin Ushey Avatar answered Oct 27 '22 14:10

Kevin Ushey