Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Appending to elements within an Rcpp List

Tags:

r

rcpp

Possibly a stupid question, but I've hunted around a lot for an answer and been unable to find one:

I'm trying to write a file reader, a la fread or read.delim but implemented in C++ and connected to R via Rcpp. The easiest way to do this and have it output a data.frame is have it produce a List of vectors - one for each column - and set the class to data.frame

List foo;
foo.push_back(column);
foo.attr("class") = "data.frame";
return foo;

Simple enough, and I've done it before. Unfortunately:

  1. The file(s) I want to read in can have a varying number of fields;
  2. This model only works elegantly if you're reading from the file column-wise, while actual files tend to be read row-wise.

So, the answer is to be able to define foo and then, for each row I read in, push_back() a field on to each of foo's underlying vectors:

List foo(1);
foo[0].push_back("turnip");

Unfortunately I can't work out how to do that: it doesn't appear that a List's member vectors can be pushed_back() to, since this results in the error "Rcpp::Vector<19>::Proxy has no member named push_back()"

So, my question: is there any way to append to a vector within an Rcpp list? Or is my only option to read the file in column-by-column, appending the resulting vectors to "foo", and bite the performance cost that's going to result from having to iterate through it [number of columns] times instead of once?

Hopefully this question is clear enough. Happy to answer any questions.

like image 584
Oliver Keyes Avatar asked Dec 20 '14 23:12

Oliver Keyes


1 Answers

It is a semi-hard problem when you know neither rows nor columns beforehand.

In a for-work, remained-closed project a few years ago, I collected my data as a variant type (using the corresponding Boost class) and converted at the end.

In Rblpapi (to which I contributed some other code), Whit tried a few approaches and ended up defining his own helper functions and I have been meaning to distill / refactor this and discuss it with Kevin -- but that hasn't happened yet.

So feel free to come up with something better :)

Generally speaking, and getting back to your problem, we frequently receive data row-wise, often via call-backs. The Rcpp types (wrapping R types) do very poorly when you append element by element -- so don't do the naive push_back as you will end up copying a lot.

So if you know your types, do std::list over corresponding std::vector<T> for the given T. These vectors you can grow. Once you have them, assembling a Rcpp::List and hence Rcpp::DataFrame is easier.

like image 127
Dirk Eddelbuettel Avatar answered Sep 27 '22 23:09

Dirk Eddelbuettel