I wrote a number crunching algorithm. The idea is that:
It seems that the program slowly eats memory so I suspect a memory leak. I have tried Address Sanitizer from Clang and Pointer Checker from Intel but they don't find anything.
Now, I am looking at the memory consumption in my Activity Monitor (I am running OSX, but I get the same memory usage from the Unix command "top") and just before the big function is called, the program takes 2 MB. When running the function, the program takes 120 MB. What is strange is that when the program ends up the big function and comes back inside the loop, it now takes 37 MB! Then, when it goes back into the big function, it takes 130 MB. Again, coming back in the loop, it takes 36 MB, then in the big function it takes 140 MB...
So it is slowly drifting away, but not with a regular pattern. How should I trust the memory usage in "top"?
Can memory fragmentation increase the memory usage without memory leak?
I let the program run overnight, and here is the data I get:
So it seems that the function that allocates and deallocates memory (about 120 MB) seems to "leak" 1 MB each time it is called.
First, make sure that over a long period of time (for example if one iteration takes a minute, run a couple hours) the growth continues. If the growths asyptotes then there's no problem. Next I would try valgrind. Then if that doesn't help, you'll have to binary search your code: Comment out bits until the growth stops. I would start by totally removing use of the MKL library (leave stubs if you want to) and see what happens. Next, change your vector to std::vector just to see if that helps it. After that you'll have to use your judgment.
I think that I have found the culprit: the MKL (the latest version as of today). I use Pardiso, and the following example leaks very slowly: about 0.1 MB every 13 seconds which leads to 280 MB overnight. These are the numbers I get from my simulation.
If you want to give it a try, you can compile it with:
icpc -std=c++11 pardiso-leak.cpp -o main -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -liomp5 -ldl -lpthread -lm
Thanks everyone for your help. I have reported the bug to Intel.
#include <iostream>
#include <vector>
#include "mkl_pardiso.h"
#include "mkl_types.h"
int main (int argc, char const *argv[])
{
  const auto n = std::size_t{1000};
  auto m = MKL_INT{n * n};
  auto values = std::vector<double>();
  auto column = std::vector<MKL_INT>();
  auto row = std::vector<MKL_INT>();
  row.push_back(1);
  for(std::size_t j = 0; j < n; ++j) {
    column.push_back(j + 1);
    values.push_back(1.0);
    column.push_back(j + n + 1);
    values.push_back(0.1);
    row.push_back(column.size() + 1);
  }
  for(std::size_t i = 1; i < n - 1; ++i) {
    for(std::size_t j = 0; j < n; ++j) {
      column.push_back(n * i + j - n + 1);
      values.push_back(0.1);
      column.push_back(n * i + j + 1);
      values.push_back(1.0);
      column.push_back(n * i + j + n + 1);
      values.push_back(0.1);
      row.push_back(column.size() + 1);
    }
  }
  for(std::size_t j = 0; j < n; ++j) {
    column.push_back((n - 1) * n + j - n + 1);
    values.push_back(0.1);
    column.push_back((n - 1) * n + j + 1);
    values.push_back(1.0);
    row.push_back(column.size() + 1);
  }
  auto y = std::vector<double>(m, 1.0);
  auto x = std::vector<double>(m, 0.0);
  auto pardiso_nrhs = MKL_INT{1};
  auto pardiso_max_fact = MKL_INT{1};
  auto pardiso_mnum = MKL_INT{1};
  auto pardiso_mtype = MKL_INT{11};
  auto pardiso_msglvl = MKL_INT{0};
  MKL_INT pardiso_iparm[64];
  for (int i = 0; i < 64; ++i) {
    pardiso_iparm[i] = 0;
  }
  pardiso_iparm[0] = 1;
  pardiso_iparm[1] = 2;
  pardiso_iparm[3] = 0;
  pardiso_iparm[4] = 0;
  pardiso_iparm[5] = 0;
  pardiso_iparm[7] = 0;
  pardiso_iparm[8] = 0;
  pardiso_iparm[9] = 13;
  pardiso_iparm[10] = 1;
  pardiso_iparm[11] = 0;
  pardiso_iparm[12] = 1;
  pardiso_iparm[17] = -1;
  pardiso_iparm[18] = 0;
  pardiso_iparm[20] = 0;
  pardiso_iparm[23] = 1;
  pardiso_iparm[24] = 0;
  pardiso_iparm[26] = 0;
  pardiso_iparm[27] = 0;
  pardiso_iparm[30] = 0;
  pardiso_iparm[31] = 0;
  pardiso_iparm[32] = 0;
  pardiso_iparm[33] = 0;
  pardiso_iparm[34] = 0;
  pardiso_iparm[59] = 0;
  pardiso_iparm[60] = 0;
  pardiso_iparm[61] = 0;
  pardiso_iparm[62] = 0;
  pardiso_iparm[63] = 0;
  void* pardiso_pt[64];
  for (int i = 0; i < 64; ++i) {
    pardiso_pt[i] = nullptr;
  }
  auto error = MKL_INT{0};
  auto phase = MKL_INT{11};
  MKL_INT i_dummy;
  double d_dummy;
  PARDISO(pardiso_pt, &pardiso_max_fact, &pardiso_mnum, &pardiso_mtype,
          &phase, &m, values.data(), row.data(), column.data(), &i_dummy,
          &pardiso_nrhs, pardiso_iparm, &pardiso_msglvl, &d_dummy,
          &d_dummy, &error);
  phase = 22;
  PARDISO(pardiso_pt, &pardiso_max_fact, &pardiso_mnum, &pardiso_mtype,
          &phase, &m, values.data(), row.data(), column.data(), &i_dummy,
          &pardiso_nrhs, pardiso_iparm, &pardiso_msglvl, &d_dummy,
          &d_dummy, &error);
  phase = 33;
  for(size_t i = 0; i < 10000; ++i) {
    std::cout << "i = " << i << std::endl;
    PARDISO(pardiso_pt, &pardiso_max_fact, &pardiso_mnum, &pardiso_mtype,
            &phase, &m, values.data(), row.data(), column.data(), &i_dummy,
            &pardiso_nrhs, pardiso_iparm, &pardiso_msglvl, y.data(),
            x.data(), &error);
  }
  phase = -1;
  PARDISO(pardiso_pt, &pardiso_max_fact, &pardiso_mnum, &pardiso_mtype,
          &phase, &m, values.data(), row.data(), column.data(), &i_dummy,
          &pardiso_nrhs, pardiso_iparm, &pardiso_msglvl, &d_dummy,
          &d_dummy, &error);
  return 0;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With