Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RcppArmadillo: Issue with memory usage

Tags:

c++

memory

r

rcpp

I have started using Rcpp. I like it a lot. I am fairly new to programming. I have a question regarding memory usage. Below is a reproducible problem:

library(RcppArmadillo)
library(inline)
code <- "
  Rcpp::NumericVector input_(input);
  arma::cube disturb(input_.begin(), 2, 2, 50000000, false);
  return wrap(2);
"
Test <- cxxfunction(signature(input = "numeric"), plugin = "RcppArmadillo", body = code)
input <- array(rnorm(2 * 2 * 50000000), dim = c(2, 2, 50000000))
Test(input)

My understanding is that in the problem above the only memory usage is when I assign an array to the variable input in R. So I should only be using around 1.6 gb (2*2*50*8 = 1600). When I go to Rcpp, I initialise the variable input_ using SEXP object which is a pointer. So this should not use any additional memory. Then when I initialise the variable disturb, I also use a pointer and set copy_aux = FALSE. So I should not be using any memory. So if my understanding is correct, I should only be using 1.6 gb when I run the code. Is this correct?

However, when I run the code, the memory usage (judging by looking at System Monitor in Ubuntu) jumps above 10 gb (from around 1 gb) before falling back down to around 4 gb. I don't understand what is going on. Did I use Rcpp incorrectly?

Your help is appreciated. Many thanks.

like image 661
master_goon Avatar asked Jul 15 '15 19:07

master_goon


1 Answers

Edit after new version of Armadillo (5.300)

After this initial Q/A on StackOverflow, Conrad Sanderson and I engaged in some email discussion about this issue. By design, the arma::cube objects create an arma::mat for each slice (the third dimension) of the cube. This is done during the creation of the cube, even if the data is copied from existing memory (as in the original question). Since this is not always needed, I suggested there should be an option to disable the pre-allocation of matrices for the slices. As of the current version of Armadillo (5.300.4), there now is. This can be installed from CRAN.

Example code:

library(RcppArmadillo)
library(inline)
code <- "
  Rcpp::NumericVector input_(input);
  arma::cube disturb(input_.begin(), 2, 2, 50000000, false, true, false);
  return wrap(2);
"
Test <- cxxfunction(signature(input = "numeric"), plugin = "RcppArmadillo", body = code)
input <- array(rnorm(2 * 2 * 50000000), dim = c(2, 2, 50000000))
Test(input)

The key thing here is that the cube constructor is now called using arma::cube disturb(input.begin(), 2, 2, 50000000, false, true, false);. The final false here is the new prealloc_mat parameter which determines whether or not to pre-allocate the matrices. The slice method will still work fine on a cube without pre-allocated matrices - the matrix will be allocated on demand. However, if you're directly accessing the mat_ptrs member of a cube it will be filled with NULL pointers. The help has also been updated.

Many thanks to Conrad Sanderson for acting so quickly to provide this additional option, and to Dirk Eddelbuettel for all his work on Rcpp and RcppArmadillo!

Original answer

It's a slightly bizarre one. I've tried with a range of different array sizes, and the problem only occurs with arrays where the 3rd dimension is much bigger than the other 2. Here's a reproducible example:

library("RcppArmadillo")
library("inline")
code <- "
Rcpp::NumericVector input_(input);
IntegerVector dim = input_.attr(\"dim\");
arma::cube disturb(input_.begin(), dim[0], dim[1], dim[2], false);
disturb[0, 0, 0] = 45;
return wrap(2);
"
Test <- cxxfunction(signature(input = "numeric"), plugin = "RcppArmadillo", body = code)
input <- array(0, c(1e7, 2, 2))
Test(input)
# no change in memory usage

dim(input) <- c(2, 1e7, 2)
gc()
Test(input)
# no change in memory usage

dim(input) <- c(2, 2, 1e7)
gc()
Test(input)
# spike in memory usage

dim(input) <- c(20, 2, 1e6)
gc()
Test(input)
# no change in memory usage

This suggests it's something about the way that the Aramadillo library is implemented (or possibly RcppArmadillo). It certainly doesn't seem to be something you're doing wrong.

Note I've included some modification in place of the data (setting the first element to 45), and you can confirm that in each case the data is modified in place, suggesting there isn't a copy going on.

For now, I'd suggest if possible organising your 3d arrays such that the largest dimension isn't the third one.

EDIT After doing some more digging, it looks as though there is allocation of RAM during the creation of the arma::cube. In Cube_meat.hpp, in the create_mat method, there's the following code:

if(n_slices <= Cube_prealloc::mat_ptrs_size)
  {
  access::rw(mat_ptrs) = const_cast< const Mat<eT>** >(mat_ptrs_local);
  }
else
  {
  access::rw(mat_ptrs) = new(std::nothrow) const Mat<eT>*[n_slices];

  arma_check_bad_alloc( (mat_ptrs == 0), "Cube::create_mat(): out of memory" );
  }
}

Cube_prealloc::mat_ptrs_size seems to be 4, so it's actually an issue for any array with more than 4 slices.

I posted an issue on github.

EDIT2 However, it's definitely an issue with the underlying Armadillo code. Here's a reproducible example which doesn't use Rcpp at all. This is linux-only - it uses code from How to get memory usage at run time in c++? to pull out the current memory usage of the running process.

#include <iostream>
#include <armadillo>
#include <unistd.h>
#include <ios>
#include <fstream>
#include <string>

//////////////////////////////////////////////////////////////////////////////
//
// process_mem_usage(double &, double &) - takes two doubles by reference,
// attempts to read the system-dependent data for a process' virtual memory
// size and resident set size, and return the results in KB.
//
// On failure, returns 0.0, 0.0

void process_mem_usage(double& vm_usage, double& resident_set)
{
   using std::ios_base;
   using std::ifstream;
   using std::string;

   vm_usage     = 0.0;
   resident_set = 0.0;

   // 'file' stat seems to give the most reliable results
   //
   ifstream stat_stream("/proc/self/stat",ios_base::in);

   // dummy vars for leading entries in stat that we don't care about
   //
   string pid, comm, state, ppid, pgrp, session, tty_nr;
   string tpgid, flags, minflt, cminflt, majflt, cmajflt;
   string utime, stime, cutime, cstime, priority, nice;
   string O, itrealvalue, starttime;

   // the two fields we want
   //
   unsigned long vsize;
   long rss;

   stat_stream >> pid >> comm >> state >> ppid >> pgrp >> session >> tty_nr
               >> tpgid >> flags >> minflt >> cminflt >> majflt >> cmajflt
               >> utime >> stime >> cutime >> cstime >> priority >> nice
               >> O >> itrealvalue >> starttime >> vsize >> rss; // don't care about the rest

   stat_stream.close();

   long page_size_kb = sysconf(_SC_PAGE_SIZE) / 1024; // in case x86-64 is configured to use 2MB pages
   vm_usage     = vsize / 1024.0;
   resident_set = rss * page_size_kb;
}

using namespace std;
using namespace arma;

void test_cube(double* numvec, int dim1, int dim2, int dim3) {
  double vm, rss;

  cout << "Press enter to continue";
  cin.get();

  process_mem_usage(vm, rss);
  cout << "Before:- VM: " << vm << "; RSS: " << rss << endl;

  cout << "cube c1(numvec, " << dim1 << ", " << dim2 << ", " << dim3 << ", false)" << endl;
  cube c1(numvec, dim1, dim2, dim3, false);

  process_mem_usage(vm, rss);
  cout << "After:-  VM: " << vm << "; RSS: " << rss << endl << endl;
}

int
main(int argc, char** argv)
  {
  double* numvec = new double[40000000];

  test_cube(numvec, 10000000, 2, 2);
  test_cube(numvec, 2, 10000000, 2);
  test_cube(numvec, 2, 2, 1000000);
  test_cube(numvec, 2, 2, 2000000);
  test_cube(numvec, 4, 2, 2000000);
  test_cube(numvec, 2, 4, 2000000);
  test_cube(numvec, 4, 4, 2000000);
  test_cube(numvec, 2, 2, 10000000);

  cout << "Press enter to finish";
  cin.get();

  return 0;
  }

EDIT 3 Per the create_mat code above, an arma::mat is created for each slice of a cube. On my 64-bit machine, this results in 184 bytes of overhead for each slice. For a cube with 5e7 slices, that equates to 8.6 GiB of overhead even though the underlying numeric data only takes up 1.5 GiB. I've emailed Conrad Sanderson to ask if this is fundamental to the way that Armadillo works or could be changed, but for now it definitely seems that you want your slice dimension (the third one) to be the smallest of the three if at all possible. It's also worth noting that this applies to all cubes, not just those created from existing memory. Using the arma::cube(dim1, dim2, dim3) constructor leads to the same memory usage.

like image 98
Nick Kennedy Avatar answered Sep 30 '22 16:09

Nick Kennedy