Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pre-allocated private std::vector in OpenMP parallelized for loop in C++

I intend to use buffers std::vector<size_t> buffer(100), one in each thread in a parallelization of a loop, as suggested by this code:

std::vector<size_t> buffer(100);
#pragma omp parallel for private(buffer)
for(size_t j = 0; j < 10000; ++j) {
    // ... code using the buffer ...
}

This code does not work. Although there is a buffer for every thread, those can have size 0.

How can I allocate the buffer in the beginning of each thread? Can I still use #pragma omp parallel for? And can I do it more elegantly than this:

std::vector<size_t> buffer;
#pragma omp parallel for private(buffer)
for(size_t j = 0; j < 10000; ++j) {
    if(buffer.size() != 100) {
        #pragma omp critical
        buffer.resize(100);
    }
    // ... code using the buffer ...
}
like image 313
Max Flow Avatar asked Mar 11 '13 22:03

Max Flow


2 Answers

The question and the accepted answer have been around for a while, here are some further information which provide additional insight into openMP and therefore might be helpful to other users.

In C++, the private and firstprivate clause handle class objects differently:

From the OpenMP Application Program Interface v3.1:

private: the new list item is initialized, or has an undefined initial value, as if it had been locally declared without an initializer. The order in which any default constructors for different private variables of class type are called is unspecified.

firstprivate: for variables of class type, a copy constructor is invoked to perform the initialization of list variables.

i.e. private calls the default constructor, whereas firstprivate calls the copy constructor of the corresponding class.

The default constructor of std::vector constructs an empty container with no elements, this is why the buffers have size 0.

To answer the question, this would be an other solution with no need to split the OpenMP region:

std::vector<size_t> buffer(100, 0);  
#pragma omp parallel for firstprivate(buffer)
for (size_t j = 0; j < 10000; ++j) {
  // use the buffer
}

EDIT a word of caution regarding private variables in general: the thread stack size is limited and unless explicitly set (environment variable OMP_STACKSIZE) compiler dependent. If you use private variables with a large memory footprint, stack overflow may become an issue.

like image 192
curly_pinguin Avatar answered Oct 21 '22 14:10

curly_pinguin


Split the OpenMP region as shown in this question.

Then declare the vector inside the outer-region, but outside the for-loop itself. This will make one local vector for each thread.

#pragma omp parallel
{
    std::vector<size_t> buffer(100);

#pragma omp for
    for(size_t j = 0; j < 10000; ++j) {
    {

        // ... code using the buffer ...

    }
}
like image 28
Mysticial Avatar answered Oct 21 '22 13:10

Mysticial