Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove input R object in C++ function environment

Tags:

c++

r

rcpp

I have an Rcpp function inside an R function. The R function produces some object (say a large list) and feeds it to the Rcpp function. Inside the Rcpp function, I process the R object, load the results to a number of C++ classes. Now the R object becomes useless. I want to wipe out the R object to make a memory-sufficient environment for the main algorithms.

The idea is:

// [[Rcpp::export]]
void cppFun(List structuredData)
{
  // copy structuredData to C++ classes
  // Now I want structuredData gone to save memory
  // main algorithms ...
}

/***R
rFun(input)
{
  # R creates structuredData from input
  cppFun(structuredData)
}
*/

I tried calling R's "rm()" in C++ but it can only identify the object names in R's global environment. For example:

// [[Rcpp::export]]
void cppFun()
{
  Language("rm", "globalDat").eval(); 
  Language("gc").eval();
}

/***R
globalDat = 1:10
ls() # shows "globalDat" is created.
cppFun() # shows "globalDat" is no longer in the environment.
ls()
*/

However, the following does not work:

// [[Rcpp::export]]
void cppFun()
{
  Language("rm", "localDat").eval(); 
  Language("gc").eval();
}

/***R
rFun <- function (x)
{
  locDat = x
  ls() //  shows "x" and "locDat" are created
  cppFun()
  ls()
}

globalDat = 1:10
ls() # shows "globalDat" is created.
rFun(globalDat) # it will print "x","locDat" twice and a warning message: In rm("localDat") : object 'localDat' not found

locDat = globalDat
rFun(globalDat) # this will still remove "locDat" from the global environment.
*/

Am I on the right track to the goal? Is there any better way?

Thank you!

Thought of a hacky solution:

  1. Write a shell class wrapping references to all the necessary C++ structured data classes.

  2. In the R function, (i) process the input; (ii) feed the structured R data to the Rcpp function; (iii) in the Rcpp function, new a shell class object, load the structured R data; (iv) memcpy the shell class pointer to a double (8 bytes, if 32-bit system, use int); (v) return the double; (vi) return the double out of the R function. Now the structured R object dies while the newed C++ shell object still lives. Call gc() for garbage collection.

  3. Feed the double to the main C++/Rcpp function. memcpy this double to a shell class pointer. delete the shell class pointer before function returns.

Tests show the above works. Just found "external pointer" or Rcpp::XPtr designed for a similar purpose?

like image 367
user2961927 Avatar asked Aug 17 '17 22:08

user2961927


1 Answers

Doing something along these lines would be known as an antipattern, or highly counterproductive, in Rcpp. Why this is problematic is Rcpp performs a shallow copy when moving an R object to C++, which means the R object shares it's memory allocation with the instantiated C++ object. If you were to remove the R object while a C++ object references it, then you may run into trouble later in the process as a segmentation fault (segfault) would likely occur.

Now, if you intend to do a deep copy from an R object into a C++ structure, this wouldn't be quite as toxic. When doing deep copies, the data does not reference the original R object. However, this is not the default schema for Rcpp.

With this being said, I strongly discourage deleting objects mid-process. If you truly are memory strapped, try "chunking"/dividing the data more, perform operations with a database, buy additional RAM, or wait for ALTREP.

like image 171
coatless Avatar answered Sep 28 '22 07:09

coatless