While C++ and specifically the Rcpp package have been tremendously helpful to me in speeding up my codes, I noticed that my C++ functions which have a list or data frame input argument(arguments of the form Rcpp::DataFrame and Rcpp::List) are very slower compared to my other C++ functions. I wrote a sample code and I wanted to ask for tricks that can make my code faster:
First, let's simulate a List in R that contains two Lists inside of it. Consider myList as a list that includes two lists - measure1 and measure2. measure1 and measure2 are lists themselves each include vectors of measurements for subjects. Here is the R code:
lappend <- function(lst, ...){
lst <- c(lst, list(...))
return(lst)
}
nSub <- 30
meas1 <- list()
meas2 <- list()
for (i in 1:nSub){
meas1 <- lappend(meas1, rnorm(10))
meas2 <- lappend(meas2, rnorm(10))
}
myList <- list(meas1 = meas1, meas2 = meas2)
Now, suppose I want a C++ function that for each subject, finds summation of measure1 and summation of measure 2 and then creates two new measurements based on these two summation. Finally the function should return these new measurements as a list.
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::List mySlowListFn(Rcpp::List myList, int nSub){
arma::vec myMult(nSub);
arma::vec myDiv(nSub);
for (int i = 0; i < nSub; i++){
arma::vec meas1_i = Rcpp::as<arma::vec>(Rcpp::as<Rcpp::List>(myList["meas1"])[i]);
arma::vec meas2_i = Rcpp::as<arma::vec>(Rcpp::as<Rcpp::List>(myList["meas2"])[i]);
myMult[i] = arma::sum(meas1_i)*arma::sum(meas2_i);
myDiv[i] = arma::sum(meas1_i)/arma::sum(meas2_i);
}
return Rcpp::List::create(Rcpp::Named("myMult") = myMult,
Rcpp::Named("myDiv") = myDiv);
}
How can I make the function above faster? I'm particularly looking for ideas that keep the input and output lists in the code (since in my own program dealing with lists is inevitable), but with some tricks to reduce some overhead time. One thing that I thought of was:
Rcpp::List mySlowListFn(const Rcpp::List& myList, int nSub)
Thanks very much for your help.
First, note that copying semantics for lists have changed in recent versions of R (definitely in latest R-devel, not sure if it made it into R 3.1.0), whereby shallow copies of lists are made, and elements within are later copied if they are modified. There is a big chance that if you are running an older version of R, then its more expensive list copying semantics are getting in the way.
That said, here's how I would re-write your function for some extra speed, with a benchmark. sourceCpp
it to compare on your own machine.
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::List mySlowListFn(Rcpp::List myList, int nSub){
arma::vec myMult(nSub);
arma::vec myDiv(nSub);
for (int i = 0; i < nSub; i++){
arma::vec meas1_i = Rcpp::as<arma::vec>(Rcpp::as<Rcpp::List>(myList["meas1"])[i]);
arma::vec meas2_i = Rcpp::as<arma::vec>(Rcpp::as<Rcpp::List>(myList["meas2"])[i]);
myMult[i] = arma::sum(meas1_i)*arma::sum(meas2_i);
myDiv[i] = arma::sum(meas1_i)/arma::sum(meas2_i);
}
return Rcpp::List::create(Rcpp::Named("myMult") = myMult,
Rcpp::Named("myDiv") = myDiv);
}
// [[Rcpp::export]]
Rcpp::List myFasterListFn(Rcpp::List myList, int nSub) {
Rcpp::NumericVector myMult = Rcpp::no_init(nSub);
Rcpp::NumericVector myDiv = Rcpp::no_init(nSub);
Rcpp::List meas1 = myList["meas1"];
Rcpp::List meas2 = myList["meas2"];
for (int i = 0; i < nSub; i++) {
arma::vec meas1_i(
REAL(VECTOR_ELT(meas1, i)), Rf_length(VECTOR_ELT(meas1, i)), false, true
);
arma::vec meas2_i(
REAL(VECTOR_ELT(meas2, i)), Rf_length(VECTOR_ELT(meas2, i)), false, true
);
myMult[i] = arma::sum(meas1_i) * arma::sum(meas2_i);
myDiv[i] = arma::sum(meas1_i) / arma::sum(meas2_i);
}
return Rcpp::List::create(
Rcpp::Named("myMult") = myMult,
Rcpp::Named("myDiv") = myDiv
);
}
/*** R
library(microbenchmark)
lappend <- function(lst, ...){
lst <- c(lst, list(...))
return(lst)
}
nSub <- 30
n <- 10
meas1 <- list()
meas2 <- list()
for (i in 1:nSub){
meas1 <- lappend(meas1, rnorm(n))
meas2 <- lappend(meas2, rnorm(n))
}
myList <- list(meas1 = meas1, meas2 = meas2)
x1 <- mySlowListFn(myList, nSub)
x2 <- myFasterListFn(myList, nSub)
microbenchmark(
mySlowListFn(myList, nSub),
myFasterListFn(myList, nSub)
)
*/
gives me
> library(microbenchmark)
> lappend <- function(lst, ...){
+ lst <- c(lst, list(...))
+ return(lst)
+ }
> nSub <- 30
> n <- 10
> meas1 <- list()
> meas2 <- list()
> for (i in 1:nSub){
+ meas1 <- lappend(meas1, rnorm(n))
+ meas2 <- lappend(meas2, rnorm(n))
+ }
> myList <- list(meas1 = meas1, meas2 = meas2)
> x1 <- mySlowListFn(myList, nSub)
> x2 <- myFasterListFn(myList, nSub)
> microbenchmark(
+ mySlowListFn(myList, nSub),
+ myFasterListFn(myList, nSub)
+ )
Unit: microseconds
expr min lq median uq max neval
mySlowListFn(myList, nSub) 14.772 15.4570 16.0715 16.7520 42.628 100
myFasterListFn(myList, nSub) 4.502 5.0675 5.2470 5.8515 18.561 100
Future versions of Rcpp
and Rcpp11
will have the ListOf<T>
class which will make it much easier to interact with lists where we know the inner type beforehand, after the proper semantics have been ironed out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With