Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why OpenMP version is slower?

Tags:

c++

openmp

I am experimenting with OpenMP. I wrote some code to check its performance. On a 4-core single Intel CPU with Kubuntu 11.04, the following program compiled with OpenMP is around 20 times slower than the program compiled without OpenMP. Why?

I compiled it by g++ -g -O2 -funroll-loops -fomit-frame-pointer -march=native -fopenmp

#include <math.h>
#include <iostream>

using namespace std;

int main ()
{
  long double i=0;
  long double k=0.7;

  #pragma omp parallel for reduction(+:i)
  for(int t=1; t<300000000; t++){       
    for(int n=1; n<16; n++){
      i=i+pow(k,n);
    }
  }

  cout << i<<"\t";
  return 0;
}
like image 256
Duncan Avatar asked Jun 28 '11 13:06

Duncan


2 Answers

The problem is that the variable k is considered to be a shared variable, so it has to be synced between the threads. A possible solution to avoid this is:

#include <math.h>
#include <iostream>

using namespace std;

int main ()
{
  long double i=0;

#pragma omp parallel for reduction(+:i)
  for(int t=1; t<30000000; t++){       
    long double k=0.7;
    for(int n=1; n<16; n++){
      i=i+pow(k,n);
    }
  }

  cout << i<<"\t";
  return 0;
}

Following the hint of Martin Beckett in the comment below, instead of declaring k inside the loop, you can also declare k const and outside the loop.

Otherwise, ejd is correct - the problem here does not seem bad parallelization, but bad optimization when the code is parallelized. Remember that the OpenMP implementation of gcc is pretty young and far from optimal.

like image 149
2 revs Avatar answered Oct 13 '22 22:10

2 revs


Fastest code:

for (int i = 0; i < 100000000; i ++) {;}

Slightly slower code:

#pragma omp parallel for num_threads(1)
for (int i = 0; i < 100000000; i ++) {;}

2-3 times slower code:

#pragma omp parallel for
for (int i = 0; i < 100000000; i ++) {;}

no matter what it is in between { and }. A simple ; or a more complex computation, same results. I compiled under Ubuntu 13.10 64-bit, using both gcc and g++, trying different parameters -ansi -pedantic-errors -Wall -Wextra -O3, and running on an Intel quad-core 3.5GHz.

I guess thread management overhead is at fault? It doens't seem smart for OMP to create a thread everytime you need one and destroy it after. I thought there would be four (or eight) threads being either running whenever needed or sleeping.

like image 25
George Avatar answered Oct 13 '22 23:10

George