The below test case runs out of memory on 32 bit machines (throwing std::bad_alloc) in the loop following the "post MT section" message when OpenMP is used, however, if the #pragmas for OpenMP are commented out, the code runs through to completion fine, so it appears that when the memory is allocated in parallel threads, it does not free correctly and thus we run out of memory. Question is whether there is something wrong with the memory allocation and deletion code below or is this a bug in gcc v4.2.2 or OpenMP? I also tried gcc v4.3 and got same failure. <pre class="prettyprint"><code>int main(int argc, char** argv) { std::cout << "start " << std::endl; { std::vector<std::vector<int*> > nts(100); #pragma omp parallel { #pragma omp for for(int begin = 0; begin < int(nts.size()); ++begin) { for(int i = 0; i < 1000000; ++i) { nts[begin].push_back(new int(5)); } } } std::cout << " pre delete " << std::endl; for(int begin = 0; begin < int(nts.size()); ++begin) { for(int j = 0; j < nts[begin].size(); ++j) { delete nts[begin][j]; } } } std::cout << "post MT section" << std::endl; { std::vector<std::vector<int*> > nts(100); int begin, i; try { for(begin = 0; begin < int(nts.size()); ++begin) { for(i = 0; i < 2000000; ++i) { nts[begin].push_back(new int(5)); } } } catch (std::bad_alloc &e) { std::cout << e.what() << std::endl; std::cout << "begin: " << begin << " i: " << i << std::endl; throw; } std::cout << "pre delete 1" << std::endl; for(int begin = 0; begin < int(nts.size()); ++begin) { for(int j = 0; j < nts[begin].size(); ++j) { delete nts[begin][j]; } } } std::cout << "end of prog" << std::endl; char c; std::cin >> c; return 0; } </code></pre>

Changing the first OpenMP loop from 1000000 to 2000000 will cause the same error. This indicates that the out of memory problem is with OpenMP stack limit. Try setting the OpenMP stack limit to unlimit in bash with <pre class="prettyprint"><code>ulimit -s unlimited </code></pre> You can also change the OpenMP environment variable OMP_STACKSIZE and setting it to 100MB or more. UPDATE 1: I change the first loop to <pre class="prettyprint"><code>{ std::vector<std::vector<int*> > nts(100); #pragma omp for schedule(static) ordered for(int begin = 0; begin < int(nts.size()); ++begin) { for(int i = 0; i < 2000000; ++i) { nts[begin].push_back(new int(5)); } } std::cout << " pre delete " << std::endl; for(int begin = 0; begin < int(nts.size()); ++begin) { for(int j = 0; j < nts[begin].size(); ++j) { delete nts[begin][j] } } } </code></pre> Then, I get a memory error at i=1574803 on the Main thread. UPDATE 2: If you are using the Intel compiler, you can add the following to the top of your code and it will solve the problem (providing you have enough memory for the extra overhead). <pre class="prettyprint"><code>std::cout << "Previous stack size " << kmp_get_stacksize_s() << std::endl; kmp_set_stacksize_s(1000000000); std::cout << "Now stack size " << kmp_get_stacksize_s() << std::endl; </code></pre> UPDATE 3: For completeness, like mentioned by another member, if you are performing some numerical computation, it is best to preallocate everything in a single new float[1000000] instead of using OpenMP to do 1000000 allocations. This applies to allocating objects as well.

Memory leak when using OpenMP

Tags:

memory-leaks

multithreading

gcc

allocator

openmp

The below test case runs out of memory on 32 bit machines (throwing std::bad_alloc) in the loop following the "post MT section" message when OpenMP is used, however, if the #pragmas for OpenMP are commented out, the code runs through to completion fine, so it appears that when the memory is allocated in parallel threads, it does not free correctly and thus we run out of memory.

Question is whether there is something wrong with the memory allocation and deletion code below or is this a bug in gcc v4.2.2 or OpenMP? I also tried gcc v4.3 and got same failure.

int main(int argc, char** argv)
{
    std::cout << "start " << std::endl;

    {
            std::vector<std::vector<int*> > nts(100);
            #pragma omp parallel
            {
                    #pragma omp for
                    for(int begin = 0; begin < int(nts.size()); ++begin) {
                            for(int i = 0; i < 1000000; ++i) {
                                    nts[begin].push_back(new int(5));
                            }
                    }
            }

    std::cout << "  pre delete " << std::endl;
            for(int begin = 0; begin < int(nts.size()); ++begin) {
                    for(int j = 0; j < nts[begin].size(); ++j) {
                            delete nts[begin][j];
                    }
            }
    }
    std::cout << "post MT section" << std::endl;
    {
            std::vector<std::vector<int*> > nts(100);
            int begin, i;
            try {
              for(begin = 0; begin < int(nts.size()); ++begin) {
                    for(i = 0; i < 2000000; ++i) {
                            nts[begin].push_back(new int(5));
                    }
              }
            } catch (std::bad_alloc &e) {
                    std::cout << e.what() << std::endl;
                    std::cout << "begin: " << begin << " i: " << i << std::endl;
                    throw;
            }
            std::cout << "pre delete 1" << std::endl;

            for(int begin = 0; begin < int(nts.size()); ++begin) {
                    for(int j = 0; j < nts[begin].size(); ++j) {
                            delete nts[begin][j];
                    }
            }
    }

    std::cout << "end of prog" << std::endl;

    char c;
    std::cin >> c;

    return 0;
}

450

asked Dec 02 '10 15:12

WilliamKF

2 Answers

Changing the first OpenMP loop from 1000000 to 2000000 will cause the same error. This indicates that the out of memory problem is with OpenMP stack limit.

Try setting the OpenMP stack limit to unlimit in bash with

ulimit -s unlimited

You can also change the OpenMP environment variable OMP_STACKSIZE and setting it to 100MB or more.

UPDATE 1: I change the first loop to

{
    std::vector<std::vector<int*> > nts(100);
    #pragma omp for schedule(static) ordered
    for(int begin = 0; begin < int(nts.size()); ++begin) {
        for(int i = 0; i < 2000000; ++i) {
            nts[begin].push_back(new int(5));
        }
    }

    std::cout << "  pre delete " << std::endl;
    for(int begin = 0; begin < int(nts.size()); ++begin) {
        for(int j = 0; j < nts[begin].size(); ++j) {
            delete nts[begin][j]
        }
    }
}

Then, I get a memory error at i=1574803 on the Main thread.

UPDATE 2: If you are using the Intel compiler, you can add the following to the top of your code and it will solve the problem (providing you have enough memory for the extra overhead).

std::cout << "Previous stack size " << kmp_get_stacksize_s() << std::endl;
kmp_set_stacksize_s(1000000000);
std::cout << "Now stack size " << kmp_get_stacksize_s() << std::endl;

UPDATE 3: For completeness, like mentioned by another member, if you are performing some numerical computation, it is best to preallocate everything in a single new float[1000000] instead of using OpenMP to do 1000000 allocations. This applies to allocating objects as well.

145

answered Oct 20 '22 11:10

Dat Chu

I found this issue elsewhere seen without OpenMP but just using pthreads. The extra memory consumption when multi-threaded appears to be typical behavior for the standard memory allocator. By switching to the Hoard allocator the extra memory consumption goes away.

answered Oct 20 '22 09:10

WilliamKF

Related questions
                            
                                Is CompletableFuture guaranteed to run un a new thread?
                            
                                How to ensure the comparison result still hold in multi-threading?
                            
                                GUI Locking up when using PrintDialog and PrintPreviewDialog in .net
                            
                                How to debug deadlock with python?
                            
                                Thread names--when do you need to know them?
                            
                                Synchronising multiple threads in python
                            
                                Java - Multiple selectors in multiple threads for nonblocking sockets
                            
                                How to handle exception in a background thread in a unit test?
                            
                                Python Threads do not run in C++ Application Embedded Interpreter
                            
                                Detect when threads are running in a python application?
                            
                                cancelPreviousPerformRequestWithTarget is not canceling my previously delayed thread started with performSelector
                            
                                Any good threads related job-interview question?
                            
                                .NET Threadpool worker threads and asynchronous IO threads
                            
                                FP-intensive hyperthreading performance on latest Xeons
                            
                                Is return atomic and should I use temporary in getter to be thread safe?
                            
                                Best practice for continual running process in C#
                            
                                C++ - Clutter 1.0 - calling function from thread causes segfault
                            
                                How can I make an SQL query thread start, then do other work before getting results?
                            
                                CountDownTimer: "Can't create handler inside thread that has not called Looper.prepare()"
                            
                                Is it safe to keep a reference to a thread in a singleton?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Memory leak when using OpenMP

Tags:

memory-leaks

multithreading

gcc

allocator

openmp

WilliamKF

People also ask

2 Answers

Dat Chu

WilliamKF

Recent Activity

Donate For Us