Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rcpp function to select (and to return) a sub-dataframe

Tags:

r

rcpp

Is it possible to write a C++ function that gets an R dataFrame as input, then modifies the dataFrame (in our case taking a subset) and returns the new data frame (in this question, returning a sub-dataframe) ? My code below may make my question more clear:

code:

# Suppose I have the data frame below created in R:
myDF = data.frame(id = rep(c(1,2), each = 5), alph = letters[1:10], mess = rnorm(10))

# Suppose I want to write a C++ function that gets id as inout and returns 
# a sub-dataframe corresponding to that id (**If it's possible to return 
# DataFrame in C++**)

# Auxiliary function --> helps get a sub vector:
arma::vec myVecSubset(arma::vec vecMain, arma::vec IDVec, int ID){
  arma::uvec AuxVec = find(IDVec == ID);
  arma::vec rslt = arma::vec(AuxVec.size());
  for (int i = 0; i < AuxVec.size(); i++){
    rslt[i] = vecMain[AuxVec[i]];
  }
  return rslt;
}

# Here is my C++ function:
Rcpp::DataFrame myVecSubset(Rcpp::DataFrame myDF, int ID){
  arma::vec id = Rcpp::as<arma::vec>(myDF["id"]);
  arma::vec alph = Rcpp::as<arma::vec>(myDF["alpha"]);
  arma::vec mess = Rcpp::as<arma::vec>(myDF["mess"]);

  // here I take a sub-vector:
  arma::vec id_sub = myVecSubset(id, id, int ID);
  arma::vec alph_sub = myVecSubset(alph, id, int ID);
  arma::vec mess_sub = myVecSubset(mess, id, int ID);

  // here is the CHALLENGE: How to combine these vectors into a new data frame???
  ???
}

In summary, there are actually two main question: 1) Is there any better way to take the sub-dataframe above in C++? (wish I could simple say myDF[myDF$id == ID,]!!!)

2) Is there anyway that I can combine id_sub, alpha_sub, and mess_sub into an R data frame and return it?

I really appreciate your help.

like image 234
Sam Avatar asked Apr 03 '14 05:04

Sam


1 Answers

To add on to Romain's answer, you can try calling the [ operator through Rcpp. If we understand how df[x, ] is evaluated (ie, it's really a call to "[.data.frame"(df, x, R_MissingArg) this is easy to do:

#include <Rcpp.h>
using namespace Rcpp;

Function subset("[.data.frame");

// [[Rcpp::export]]
DataFrame subset_test(DataFrame x, IntegerVector y) {
  return subset(x, y, R_MissingArg);
}

/*** R
df <- data.frame(x=1:3, y=letters[1:3])
subset_test(df, c(1L, 2L))
*/

gives me

> df <- data.frame(x=1:3, y=letters[1:3])
> subset_test(df, c(1L, 2L))
  x y
1 1 a
2 2 b

Callbacks to R can generally be slower in Rcpp, but depending on how much of a bottleneck this is it could still be fast enough for you.

Be careful though, as this function will use 1-based subsetting rather than 0-based subsetting for integer vectors.

like image 84
Kevin Ushey Avatar answered Sep 20 '22 08:09

Kevin Ushey