Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel for loop in openmp

I'm trying to parallelize a very simple for-loop, but this is my first attempt at using openMP in a long time. I'm getting baffled by the run times. Here is my code:

#include <vector> #include <algorithm>  using namespace std;  int main ()  {     int n=400000,  m=1000;       double x=0,y=0;     double s=0;     vector< double > shifts(n,0);       #pragma omp parallel for      for (int j=0; j<n; j++) {          double r=0.0;         for (int i=0; i < m; i++){              double rand_g1 = cos(i/double(m));             double rand_g2 = sin(i/double(m));                   x += rand_g1;             y += rand_g2;             r += sqrt(rand_g1*rand_g1 + rand_g2*rand_g2);         }         shifts[j] = r / m;     }      cout << *std::max_element( shifts.begin(), shifts.end() ) << endl; } 

I compile it with

g++ -O3 testMP.cc -o testMP  -I /opt/boost_1_48_0/include 

that is, no "-fopenmp", and I get these timings:

real    0m18.417s user    0m18.357s sys     0m0.004s 

when I do use "-fopenmp",

g++ -O3 -fopenmp testMP.cc -o testMP  -I /opt/boost_1_48_0/include 

I get these numbers for the times:

real    0m6.853s user    0m52.007s sys     0m0.008s 

which doesn't make sense to me. How using eight cores can only result in just 3-fold increase of performance? Am I coding the loop correctly?

like image 388
dsign Avatar asked Aug 02 '12 07:08

dsign


People also ask

How do you parallelize a loop using OpenMP?

The #pragma omp parallel for creates a parallel region (as described before), and to the threads of that region the iterations of the loop that it encloses will be assigned, using the default chunk size , and the default schedule which is typically static .

What is parallel for in OpenMP?

The OpenMP clause: #pragma omp parallel. creates a parallel region with a team of threads , where each thread will execute the entire block of code that the parallel region encloses.

Is OpenMP parallel or concurrent?

Work-sharing constructs can be used to divide a task among the threads so that each thread executes its allocated part of the code. Both task parallelism and data parallelism can be achieved using OpenMP in this way.


2 Answers

You should make use of the OpenMP reduction clause for x and y:

#pragma omp parallel for reduction(+:x,y) for (int j=0; j<n; j++) {      double r=0.0;     for (int i=0; i < m; i++){          double rand_g1 = cos(i/double(m));         double rand_g2 = sin(i/double(m));               x += rand_g1;         y += rand_g2;         r += sqrt(rand_g1*rand_g1 + rand_g2*rand_g2);     }     shifts[j] = r / m; } 

With reduction each thread accumulates its own partial sum in x and y and in the end all partial values are summed together in order to obtain the final values.

Serial version: 25.05s user 0.01s system 99% cpu 25.059 total OpenMP version w/ OMP_NUM_THREADS=16: 24.76s user 0.02s system 1590% cpu 1.559 total 

See - superlinear speed-up :)

like image 67
Hristo Iliev Avatar answered Sep 18 '22 19:09

Hristo Iliev


let's try to understand how parallelize simple for loop using OpenMP

#pragma omp parallel #pragma omp for     for(i = 1; i < 13; i++)     {        c[i] = a[i] + b[i];     } 

assume that we have 3 available threads, this is what will happen

enter image description here

firstly

  • Threads are assigned an independent set of iterations

and finally

  • Threads must wait at the end of work-sharing construct
like image 42
Basheer AL-MOMANI Avatar answered Sep 19 '22 19:09

Basheer AL-MOMANI