I have to my disposal 8 processors. I wanted to do parallel resizes as follows:
vector<vector <int> > test;
test.resize(10000);
#pragma omp parallel num_threads(8)
{
#pragma omp for
for (int i = 0;i < 10000;i++)test[i].resize(500000);
}
I noticed that the program didn't use 100% of processor power - it used only 15%. As I changed the code for
vector<vector <int> > test;
test.resize(1000000);
#pragma omp parallel num_threads(8)
{
#pragma omp for
for (int i = 0;i < 1000000;i++)test[i].resize(5000);
}
the program used about 60% of processor power. I don't understand this phenomenon - I hoped it would use 100% of processor power in bogth cases. Am I missing something here?
On Windows, the CRT uses the built-in Windows heap implementation, which is single-threaded.
HeapAlloc locks a CriticalSection (essentially a mutex) for the duration of allocation, essentially sequentializing the allocation process.
Since vector resizing is mostly heap (re)allocation, you will not see much improvement from parallelizing it.
Serialization ensures mutual exclusion when two or more threads attempt to simultaneously allocate or free blocks from the same heap.
Setting the
HEAP_NO_SERIALIZE
value eliminates mutual exclusion on the heap. Without serialization, two or more threads that use the same heap handle might attempt to allocate or free memory simultaneously, likely causing corruption in the heap.
To benefit from parallel memory allocation, use a different heap allocator. For example jemalloc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With