Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenMP: Huge performance differences between Visual C++ 2008 and 2010

I'm running a camera acquisition program that performs processing on acquired images, and I'm using simple OpenMP directives for this processing. So basically I wait for an image from the camera, and then process it.

When migrating to VC2010, I see very strange performance hog : under VC2010 my app is taking nearly 100% CPU while it is taking only 10% under VC2008.

If I benchmark only the processing code I get no difference between VC2010 and VC2008, the difference occurs when using the acquisition functions.

I have reduced the code needed to reproduce the problem to a simple loop that does the following:

  for (int i=0; i<1000; ++i)
  {
    GetImage(buffer);//wait for image
    Copy2Array(buffer, my_array);

    long long sum = 0;//do some simple OpenMP parallel loop
    #pragma omp parallel for reduction(+:sum)
    for (int j=0; j<size; ++j)
      sum += my_array[j];
  }

This loop eats 5% of CPU with 2008, and 70% with 2010.

I've done some profiling, that shows that in 2010 most of the time is spent in OpenMP's vcomp100.dll!_vcomp::PartialBarrierN::Block

I have also done some concurrency profiling:

In 2008, processing work is distributed over 3 worker threads, that are very lightly active as processing time is much inferior as image waiting time

The same threads appear in 2010, but they are all 100% occupied by the PartialBarrierN::Block function. As I have four cores, they are eating 75% of the work, which is roughly what I see in the CPU occupation.

So it looks like there is a conflict between OpenMP and the Matrox acquisition library (proprietary). But is it a bug of VS2010 or Matrox? Is there anything I can do? Using VC++2010 is mandatory for me, so I cannot just stick with 2008.

Big thanks

STATUS UPDATE

Using new concurrency framework, as suggested by DeadMG, leads to 40% CPU. Profiling it shows that time is spent in processing, so it doesn't show the bug I'm seeing with OpenMP, but performance in my case is way poorer than OpenMP.

STATUS UPDATE 2

I have installed an evaluation version of latest Intel C++. It shows exactly the same performance problems!!

I cross-posted to MSDN forum

STATUS UPDATE 3

Tested on Windows 7 64 bits and XP 32 bits, with the exact same results (on the same machinje)

like image 457
CharlesB Avatar asked Jan 19 '11 16:01

CharlesB


3 Answers

I tested another acquisition board, and the problem is identical, so the culprit is VC++2010. Microsoft made OpenMP implementation changes that screws up programs like mine, as a thread on MSDN forums shows.

like image 147
CharlesB Avatar answered Oct 20 '22 01:10

CharlesB


With OpenMP 3.0 the spinwait can be deactivated via OMP_WAIT_POLICY:

_putenv_s( "OMP_WAIT_POLICY", "PASSIVE" );

The effect is basically the same as with kmp_set_blocktime(0), but as we set the environment variable OMP_WAIT_POLICY during runtime, it'll only affect the current process and child processes.

Of course OMP_WAIT_POLICY can also be set by a launcher application, e.g. Blender handles it that way.

A hotfix for VC2010 is available here, later versions like VC2013 support it directly.

like image 44
user3671833 Avatar answered Oct 20 '22 03:10

user3671833


You could try the new Concurrency Runtime that ships with VS2010- just starting on your test sample.

That is,

for (int i=0; i<1000; ++i)
  {
    GetImage(buffer);//wait for image
    Copy2Array(buffer, my_array);

    long long sum = 0;//do some simple OpenMP parallel loop
    #pragma omp parallel for reduction(+:sum)
    for (int j=0; j<size; ++j)
      sum += my_array[j];
  }

would become

for (int i=0; i<1000; ++i)
  {
    GetImage(buffer);//wait for image
    Copy2Array(buffer, my_array);

    Concurrency::combinable<int> combint;
    Concurrency::parallel_for(0, size / 1000, [&](int j) {
      for(int i = 0; i < 1000; i++)
          combint.local() += my_array[(j * 1000) + i];
    });
    combint.combine([](int a, int b) { return a + b; });
  }
like image 4
Puppy Avatar answered Oct 20 '22 03:10

Puppy