Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shared vectors in OpenMP

I am trying to parallize a program I am using and got the following question. Will I get a loss of performance if multiple threads need to read/write on the same vector but different elements of the vector ? I have the feeling thats the reason my program hardly gets any faster upon parallizing it. Take the following code:

#include <vector> 

int main(){

    vector<double> numbers;
    vector<double> results(10);
    double x;

    //write 10 values in vector numbers
    for (int i =0; i<10; i++){
        numbers.push_back(cos(i));  
    } 

#pragma omp parallel for \
    private(x) \
    shared(numbers, results)
        for(int j = 0;  j < 10;  j++){

            x  =  2 * numbers[j]  +  5;  
#pragma omp critical  // do I need this ?
            {
                results[j]  =  x;     
            }
        }

    return 0;

}

Obviously the actual program does far more expensive operations, but this example shall only explain my question. So can the for loop be done fast and completely parallel or do the different threads have to wait for each other because only one thread at a time can access the vector number for instance although they are all reading different elements of the vector ?

Same question with the write operation: Do I need the critical pragma or is it no problem since every thread writes into a different element of the vector results ? I am happy with every help I can get and also it would be good to know if there is a better way to do this (maybe not use vectors at all, but simple arrays and pointers etc. ?) I also read vectors aren't thread safe in certain cases and it is recommended to use a pointer: OpenMP and STL vector

Thanks a lot for your help!

like image 742
user1304680 Avatar asked Feb 02 '23 06:02

user1304680


2 Answers

I imagine that most of the issues with vectors in multiple threads would be if it has to resize, then it copies the entire contents of the vector into a new place in memory (a larger allocated chunk) which if you're accessing this in parallel then you just tried to read an object that has been deleted.

If you are not resizing your array, then I have had never had any trouble with concurrent read writes into the vector (obviously as long as I'm not writing twice the same element)

As for the lack of performance boost, the openmp critical section will slow your program down to probably the same as just using 1 thread (depending on how much is actually done outside that critical section)

You can remove the critical section statement (with the conditions above in mind).

like image 88
SirGuy Avatar answered Feb 06 '23 14:02

SirGuy


You get no speedup precisely because of the critical sectino, which is superfluous, since the same elements will never be modified at the same time. Remove the critical section piece and it will work just fine.

You can play with the schedule strategy as well, because if memory access is not linear (it is in the example you gave), threads might fight for cache (writing elements in the same cache line). OTOH if the number of elements is given as in your case and there is no branching in the loop (therefore they will execute at about the same speed), static, which is IIRC the default, should work the best anyway.

(BTW you can declare x inside the loop to avoid private(x) and the shared directive is implied IIRC (I never used it).)

like image 31
eudoxos Avatar answered Feb 06 '23 14:02

eudoxos