Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The correct usage of nested #pragma omp for directives

Tags:

c++

openmp

The following code runs like a charm before OpenMP parallelization was applied. In fact, the following code was in a state of endless loop! I'm sure that's result from my incorrect use to the OpenMP directives. Would you please show me the correct way? Thank you very much.

          #pragma omp parallel for
          for (int nY = nYTop; nY <= nYBottom; nY++)
          {   
              for (int nX = nXLeft; nX <= nXRight; nX++)
              {   
                  // Use look-up table for performance
                  dLon = theApp.m_LonLatLUT.LonGrid()[nY][nX] + m_FavoriteSVISSRParams.m_dNadirLon;
                  dLat = theApp.m_LonLatLUT.LatGrid()[nY][nX];

                  // If you don't want to use longitude/latitude look-up table, uncomment the following line
                  //NOMGeoLocate.XYToGEO(dLon, dLat, nX, nY);

                  if (dLon > 180 || dLat > 180)
                  {  
                     continue;
                  }

                  if (Navigation.GeoToXY(dX, dY, dLon, dLat, 0) > 0) 
                  {  
                     continue;
                  }

                  // Skip void data scanline
                  dY = dY - nScanlineOffset;

                  // Compute coefficients as well as its four neighboring points' values
                  nX1 = int(dX);
                  nX2 = nX1 + 1;
                  nY1 = int(dY);
                  nY2 = nY1 + 1;

                  dCx = dX - nX1;
                  dCy = dY - nY1;

                  dP1 = pIRChannelData->operator [](nY1)[nX1];
                  dP2 = pIRChannelData->operator [](nY1)[nX2];
                  dP3 = pIRChannelData->operator [](nY2)[nX1];
                  dP4 = pIRChannelData->operator [](nY2)[nX2];

                  // Bilinear interpolation
                  usNomDataBlock[nY][nX] = (unsigned short)BilinearInterpolation(dCx, dCy, dP1, dP2, dP3, dP4);
              } 
          }
like image 228
GoldenLee Avatar asked Dec 16 '22 07:12

GoldenLee


1 Answers

Don't nest it too deep. Usually, it would be enough to identify a good point for parallelization and get away with just one directive.

Some comments and probably the root of your problem:

      #pragma omp parallel default(shared)  // Here you open several threads ...
      {   
          #pragma omp for
          for (int nY = nYTop; nY <= nYBottom; nY++)  
          {                                          

              #pragma omp parallel shared(nY, nYBottom) // Same here ...
              {   
                  #pragma omp for
                  for (int nX = nXLeft; nX <= nXRight; nX++)
                  { 

(Conceptually) you are opening many threads, in each of them you open many threads again in the for loop. For each thread in the for loop, you open many threads again, and for each of those, you open again many in another for loop.

That's (thread (thread)*)+ in pattern matching words; there should just be thread+

Just do a single parallel for. Don't be to fine-grained, parallelize on the outer loop, each thread should run as long as possible:

#pragma omp parallel for
for (int nY = nYTop; nY <= nYBottom; nY++)
{      
    for (int nX = nXLeft; nX <= nXRight; nX++)
    {
    }
}

Avoid data and cache sharing between the threads (another reason why the threads shouldn't be too fine grained on your data).

If that's running stable and shows good speed up, you can fine tune it with different scheduling algorithms as per your OpenMP reference card.

And put your variable declarations to where you really need them. Do not overwrite what is read by sibling threads.

like image 172
Sebastian Mach Avatar answered Dec 31 '22 02:12

Sebastian Mach