The below test case runs out of memory on 32 bit machines (throwing std::bad_alloc) in the loop following the "post MT section" message when OpenMP is used, however, if the #pragmas for OpenMP are commented out, the code runs through to completion fine, so it appears that when the memory is allocated in parallel threads, it does not free correctly and thus we run out of memory.
Question is whether there is something wrong with the memory allocation and deletion code below or is this a bug in gcc v4.2.2 or OpenMP? I also tried gcc v4.3 and got same failure.
int main(int argc, char** argv)
{
std::cout << "start " << std::endl;
{
std::vector<std::vector<int*> > nts(100);
#pragma omp parallel
{
#pragma omp for
for(int begin = 0; begin < int(nts.size()); ++begin) {
for(int i = 0; i < 1000000; ++i) {
nts[begin].push_back(new int(5));
}
}
}
std::cout << " pre delete " << std::endl;
for(int begin = 0; begin < int(nts.size()); ++begin) {
for(int j = 0; j < nts[begin].size(); ++j) {
delete nts[begin][j];
}
}
}
std::cout << "post MT section" << std::endl;
{
std::vector<std::vector<int*> > nts(100);
int begin, i;
try {
for(begin = 0; begin < int(nts.size()); ++begin) {
for(i = 0; i < 2000000; ++i) {
nts[begin].push_back(new int(5));
}
}
} catch (std::bad_alloc &e) {
std::cout << e.what() << std::endl;
std::cout << "begin: " << begin << " i: " << i << std::endl;
throw;
}
std::cout << "pre delete 1" << std::endl;
for(int begin = 0; begin < int(nts.size()); ++begin) {
for(int j = 0; j < nts[begin].size(); ++j) {
delete nts[begin][j];
}
}
}
std::cout << "end of prog" << std::endl;
char c;
std::cin >> c;
return 0;
}
OpenMP will: Allow a programmer to separate a program into serial regions and parallel regions, rather than T concurrently-executing threads.
When run, an OpenMP program will use one thread (in the sequential sections), and several threads (in the parallel sections). There is one thread that runs from the beginning to the end, and it's called the master thread.
OpenMP is an implementation of multithreading, a method of parallelizing whereby a primary thread (a series of instructions executed consecutively) forks a specified number of sub-threads and the system divides a task among them.
Changing the first OpenMP loop from 1000000 to 2000000 will cause the same error. This indicates that the out of memory problem is with OpenMP stack limit.
Try setting the OpenMP stack limit to unlimit in bash with
ulimit -s unlimited
You can also change the OpenMP environment variable OMP_STACKSIZE and setting it to 100MB or more.
UPDATE 1: I change the first loop to
{
std::vector<std::vector<int*> > nts(100);
#pragma omp for schedule(static) ordered
for(int begin = 0; begin < int(nts.size()); ++begin) {
for(int i = 0; i < 2000000; ++i) {
nts[begin].push_back(new int(5));
}
}
std::cout << " pre delete " << std::endl;
for(int begin = 0; begin < int(nts.size()); ++begin) {
for(int j = 0; j < nts[begin].size(); ++j) {
delete nts[begin][j]
}
}
}
Then, I get a memory error at i=1574803 on the Main thread.
UPDATE 2: If you are using the Intel compiler, you can add the following to the top of your code and it will solve the problem (providing you have enough memory for the extra overhead).
std::cout << "Previous stack size " << kmp_get_stacksize_s() << std::endl;
kmp_set_stacksize_s(1000000000);
std::cout << "Now stack size " << kmp_get_stacksize_s() << std::endl;
UPDATE 3: For completeness, like mentioned by another member, if you are performing some numerical computation, it is best to preallocate everything in a single new float[1000000] instead of using OpenMP to do 1000000 allocations. This applies to allocating objects as well.
I found this issue elsewhere seen without OpenMP but just using pthreads. The extra memory consumption when multi-threaded appears to be typical behavior for the standard memory allocator. By switching to the Hoard allocator the extra memory consumption goes away.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With