Is it possible to write a C++ function that gets an R dataFrame as input, then modifies the dataFrame (in our case taking a subset) and returns the new data frame (in this question, returning a sub-dataframe) ? My code below may make my question more clear:
code:
# Suppose I have the data frame below created in R:
myDF = data.frame(id = rep(c(1,2), each = 5), alph = letters[1:10], mess = rnorm(10))
# Suppose I want to write a C++ function that gets id as inout and returns
# a sub-dataframe corresponding to that id (**If it's possible to return
# DataFrame in C++**)
# Auxiliary function --> helps get a sub vector:
arma::vec myVecSubset(arma::vec vecMain, arma::vec IDVec, int ID){
arma::uvec AuxVec = find(IDVec == ID);
arma::vec rslt = arma::vec(AuxVec.size());
for (int i = 0; i < AuxVec.size(); i++){
rslt[i] = vecMain[AuxVec[i]];
}
return rslt;
}
# Here is my C++ function:
Rcpp::DataFrame myVecSubset(Rcpp::DataFrame myDF, int ID){
arma::vec id = Rcpp::as<arma::vec>(myDF["id"]);
arma::vec alph = Rcpp::as<arma::vec>(myDF["alpha"]);
arma::vec mess = Rcpp::as<arma::vec>(myDF["mess"]);
// here I take a sub-vector:
arma::vec id_sub = myVecSubset(id, id, int ID);
arma::vec alph_sub = myVecSubset(alph, id, int ID);
arma::vec mess_sub = myVecSubset(mess, id, int ID);
// here is the CHALLENGE: How to combine these vectors into a new data frame???
???
}
In summary, there are actually two main question: 1) Is there any better way to take the sub-dataframe above in C++? (wish I could simple say myDF[myDF$id == ID,]!!!)
2) Is there anyway that I can combine id_sub, alpha_sub, and mess_sub into an R data frame and return it?
I really appreciate your help.
To add on to Romain's answer, you can try calling the [
operator through Rcpp. If we understand how df[x, ]
is evaluated (ie, it's really a call to "[.data.frame"(df, x, R_MissingArg)
this is easy to do:
#include <Rcpp.h>
using namespace Rcpp;
Function subset("[.data.frame");
// [[Rcpp::export]]
DataFrame subset_test(DataFrame x, IntegerVector y) {
return subset(x, y, R_MissingArg);
}
/*** R
df <- data.frame(x=1:3, y=letters[1:3])
subset_test(df, c(1L, 2L))
*/
gives me
> df <- data.frame(x=1:3, y=letters[1:3])
> subset_test(df, c(1L, 2L))
x y
1 1 a
2 2 b
Callbacks to R can generally be slower in Rcpp, but depending on how much of a bottleneck this is it could still be fast enough for you.
Be careful though, as this function will use 1-based subsetting rather than 0-based subsetting for integer vectors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With