Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rcpp Create DataFrame with Variable Number of Columns

Tags:

c++

r

rcpp

I am interested in using Rcpp to create a data frame with a variable number of columns. By that, I mean that the number of columns will be known only at runtime. Some of the columns will be standard, but others will be repeated n times where n is the number of features I am considering in a particular run.

I am aware that I can create a data frame as follows:

IntegerVector i1(3); i1[0]=4;i1[1]=2134;i1[2]=3453;
IntegerVector i2(3); i2[0]=4123;i2[1]=343;i2[2]=99123;
DataFrame df = DataFrame::create(Named("V1")=i1,Named("V2")=i2);

but in this case it is assumed that the number of columns is 2.

To simplify the explanation of what I need, assume that I would like pass a SEXP variable specifying the number of columns to create in the variable part. Something like:

RcppExport SEXP myFunc(SEXP n, SEXP <other stuff>)
IntegerVector i1(3); <compute i1>
IntegerVector i2(3); <compute i2>
for(int i=0;i<n;i++){compute vi}
DataFrame df = DataFrame::create(Named("Num")=i1,Named("ID")=i2,...,other columns v1 to vn);

where n is passed as an argument. The final data frame in R would look like

Num ID V1 ... Vn
  1  2  5     'aasda'
  ...

(In reality, the column names will not be of the form "Vx", but they will be known at runtime.) In other words, I cannot use a static list of

Named()=...

since the number will change.

I have tried skipping the "Named()" part of the constructor and then naming the columns at the end, but the results are junk.

Can this be done?

like image 783
xbot Avatar asked Aug 17 '15 21:08

xbot


2 Answers

If I understand your question correctly, it seems like it would be easiest to take advantage of the DataFrame constructor that takes a List as an argument (since the size of a List can be specified directly), and set the names of your columns via .attr("names") and a CharacterVector:


#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::DataFrame myFunc(int n, Rcpp::List lst, 
                       Rcpp::CharacterVector Names = Rcpp::CharacterVector::create()) {

  Rcpp::List tmp(n + 2);
  tmp[0] = Rcpp::IntegerVector(3);
  tmp[1] = Rcpp::IntegerVector(3);

  Rcpp::CharacterVector lnames = Names.size() < lst.size() ?
    lst.attr("names") : Names;
  Rcpp::CharacterVector names(n + 2);
  names[0] = "Num";
  names[1] = "ID";

  for (std::size_t i = 0; i < n; i++) {
    // tmp[i + 2] = do_something(lst[i]);
    tmp[i + 2] = lst[i];
    if (std::string(lnames[i]).compare("") != 0) {
      names[i + 2] = lnames[i];
    } else {
      names[i + 2] = "V" + std::to_string(i);
    }
  }
  Rcpp::DataFrame result(tmp);
  result.attr("names") = names;
  return result;
}

There's a little extra going on there to allow the Names vector to be optional - e.g. if you just use a named list you can omit the third argument.


lst1 <- list(1L:3L, 1:3 + .25, letters[1:3])
##
> myFunc(length(lst1), lst1, c("V1", "V2", "V3"))
#  Num ID V1   V2 V3
#1   0  0  1 1.25  a
#2   0  0  2 2.25  b
#3   0  0  3 3.25  c

lst2 <- list(
  Column1 = 1L:3L,
  Column2 = 1:3 + .25,
  Column3 = letters[1:3],
  Column4 = LETTERS[1:3])
##
> myFunc(length(lst2), lst2)
#  Num ID Column1 Column2 Column3 Column4
#1   0  0       1    1.25       a       A
#2   0  0       2    2.25       b       B
#3   0  0       3    3.25       c       C

Just be aware of the 20-length limit for this signature of the DataFrame constructor, as pointed out by @hrbrmstr.

like image 65
nrussell Avatar answered Sep 20 '22 00:09

nrussell


It's an old question, but I think more people are struggling with this, like me. Starting from the other answers here, I arrived at a solution that isn't limited by the 20 column limit of the DataFrame constructor:

// [[Rcpp::plugins(cpp11)]]
#include <Rcpp.h>
#include <string>
#include <iostream>

using namespace Rcpp;

// [[Rcpp::export]]
List variableColumnList(int numColumns=30) {
    List retval;
    for (int i=0; i<numColumns; i++) {
        std::ostringstream colName;
        colName << "V" << i+1;
        retval.push_back( IntegerVector::create(100*i, 100*i + 1),colName.str());
    }
    return retval;
}

// [[Rcpp::export]]
DataFrame variableColumnListAsDF(int numColumns=30) {
    Function asDF("as.data.frame");

    return asDF(variableColumnList(numColumns));
}

// [[Rcpp::export]]
DataFrame variableColumnListAsTibble(int numColumns=30) {
    Function asTibble("tbl_df");

    return asTibble(variableColumnList(numColumns));
}

So build a C++ List first by pushing columns onto an empty List. (I generate the values and the column names on the fly here.) Then, either return that as an R list, or use one of two helper functions to convert them into a data.frame or tbl_df. One could do the latter from R, but I find this cleaner.

like image 34
ArtHarg Avatar answered Sep 21 '22 00:09

ArtHarg