Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory leak when using OpenMP

The below test case runs out of memory on 32 bit machines (throwing std::bad_alloc) in the loop following the "post MT section" message when OpenMP is used, however, if the #pragmas for OpenMP are commented out, the code runs through to completion fine, so it appears that when the memory is allocated in parallel threads, it does not free correctly and thus we run out of memory.

Question is whether there is something wrong with the memory allocation and deletion code below or is this a bug in gcc v4.2.2 or OpenMP? I also tried gcc v4.3 and got same failure.

int main(int argc, char** argv)
{
    std::cout << "start " << std::endl;

    {
            std::vector<std::vector<int*> > nts(100);
            #pragma omp parallel
            {
                    #pragma omp for
                    for(int begin = 0; begin < int(nts.size()); ++begin) {
                            for(int i = 0; i < 1000000; ++i) {
                                    nts[begin].push_back(new int(5));
                            }
                    }
            }

    std::cout << "  pre delete " << std::endl;
            for(int begin = 0; begin < int(nts.size()); ++begin) {
                    for(int j = 0; j < nts[begin].size(); ++j) {
                            delete nts[begin][j];
                    }
            }
    }
    std::cout << "post MT section" << std::endl;
    {
            std::vector<std::vector<int*> > nts(100);
            int begin, i;
            try {
              for(begin = 0; begin < int(nts.size()); ++begin) {
                    for(i = 0; i < 2000000; ++i) {
                            nts[begin].push_back(new int(5));
                    }
              }
            } catch (std::bad_alloc &e) {
                    std::cout << e.what() << std::endl;
                    std::cout << "begin: " << begin << " i: " << i << std::endl;
                    throw;
            }
            std::cout << "pre delete 1" << std::endl;

            for(int begin = 0; begin < int(nts.size()); ++begin) {
                    for(int j = 0; j < nts[begin].size(); ++j) {
                            delete nts[begin][j];
                    }
            }
    }

    std::cout << "end of prog" << std::endl;

    char c;
    std::cin >> c;

    return 0;
}
like image 450
WilliamKF Avatar asked Dec 02 '10 15:12

WilliamKF


People also ask

Is OpenMP parallel or concurrent?

OpenMP will: Allow a programmer to separate a program into serial regions and parallel regions, rather than T concurrently-executing threads.

Does OpenMP use threads or processes?

When run, an OpenMP program will use one thread (in the sequential sections), and several threads (in the parallel sections). There is one thread that runs from the beginning to the end, and it's called the master thread.

What is multithreading approach used in OpenMP?

OpenMP is an implementation of multithreading, a method of parallelizing whereby a primary thread (a series of instructions executed consecutively) forks a specified number of sub-threads and the system divides a task among them.


2 Answers

Changing the first OpenMP loop from 1000000 to 2000000 will cause the same error. This indicates that the out of memory problem is with OpenMP stack limit.

Try setting the OpenMP stack limit to unlimit in bash with

ulimit -s unlimited

You can also change the OpenMP environment variable OMP_STACKSIZE and setting it to 100MB or more.

UPDATE 1: I change the first loop to

{
    std::vector<std::vector<int*> > nts(100);
    #pragma omp for schedule(static) ordered
    for(int begin = 0; begin < int(nts.size()); ++begin) {
        for(int i = 0; i < 2000000; ++i) {
            nts[begin].push_back(new int(5));
        }
    }

    std::cout << "  pre delete " << std::endl;
    for(int begin = 0; begin < int(nts.size()); ++begin) {
        for(int j = 0; j < nts[begin].size(); ++j) {
            delete nts[begin][j]
        }
    }
}

Then, I get a memory error at i=1574803 on the Main thread.

UPDATE 2: If you are using the Intel compiler, you can add the following to the top of your code and it will solve the problem (providing you have enough memory for the extra overhead).

std::cout << "Previous stack size " << kmp_get_stacksize_s() << std::endl;
kmp_set_stacksize_s(1000000000);
std::cout << "Now stack size " << kmp_get_stacksize_s() << std::endl;

UPDATE 3: For completeness, like mentioned by another member, if you are performing some numerical computation, it is best to preallocate everything in a single new float[1000000] instead of using OpenMP to do 1000000 allocations. This applies to allocating objects as well.

like image 145
Dat Chu Avatar answered Oct 20 '22 11:10

Dat Chu


I found this issue elsewhere seen without OpenMP but just using pthreads. The extra memory consumption when multi-threaded appears to be typical behavior for the standard memory allocator. By switching to the Hoard allocator the extra memory consumption goes away.

like image 3
WilliamKF Avatar answered Oct 20 '22 09:10

WilliamKF