The following code runs like a charm before OpenMP parallelization was applied. In fact, the following code was in a state of endless loop! I'm sure that's result from my incorrect use to the OpenMP directives. Would you please show me the correct way? Thank you very much.
#pragma omp parallel for
for (int nY = nYTop; nY <= nYBottom; nY++)
{
for (int nX = nXLeft; nX <= nXRight; nX++)
{
// Use look-up table for performance
dLon = theApp.m_LonLatLUT.LonGrid()[nY][nX] + m_FavoriteSVISSRParams.m_dNadirLon;
dLat = theApp.m_LonLatLUT.LatGrid()[nY][nX];
// If you don't want to use longitude/latitude look-up table, uncomment the following line
//NOMGeoLocate.XYToGEO(dLon, dLat, nX, nY);
if (dLon > 180 || dLat > 180)
{
continue;
}
if (Navigation.GeoToXY(dX, dY, dLon, dLat, 0) > 0)
{
continue;
}
// Skip void data scanline
dY = dY - nScanlineOffset;
// Compute coefficients as well as its four neighboring points' values
nX1 = int(dX);
nX2 = nX1 + 1;
nY1 = int(dY);
nY2 = nY1 + 1;
dCx = dX - nX1;
dCy = dY - nY1;
dP1 = pIRChannelData->operator [](nY1)[nX1];
dP2 = pIRChannelData->operator [](nY1)[nX2];
dP3 = pIRChannelData->operator [](nY2)[nX1];
dP4 = pIRChannelData->operator [](nY2)[nX2];
// Bilinear interpolation
usNomDataBlock[nY][nX] = (unsigned short)BilinearInterpolation(dCx, dCy, dP1, dP2, dP3, dP4);
}
}
Don't nest it too deep. Usually, it would be enough to identify a good point for parallelization and get away with just one directive.
Some comments and probably the root of your problem:
#pragma omp parallel default(shared) // Here you open several threads ...
{
#pragma omp for
for (int nY = nYTop; nY <= nYBottom; nY++)
{
#pragma omp parallel shared(nY, nYBottom) // Same here ...
{
#pragma omp for
for (int nX = nXLeft; nX <= nXRight; nX++)
{
(Conceptually) you are opening many threads, in each of them you open many threads again in the for loop. For each thread in the for loop, you open many threads again, and for each of those, you open again many in another for loop.
That's (thread (thread)*)+
in pattern matching words; there should just be thread+
Just do a single parallel for. Don't be to fine-grained, parallelize on the outer loop, each thread should run as long as possible:
#pragma omp parallel for
for (int nY = nYTop; nY <= nYBottom; nY++)
{
for (int nX = nXLeft; nX <= nXRight; nX++)
{
}
}
Avoid data and cache sharing between the threads (another reason why the threads shouldn't be too fine grained on your data).
If that's running stable and shows good speed up, you can fine tune it with different scheduling algorithms as per your OpenMP reference card.
And put your variable declarations to where you really need them. Do not overwrite what is read by sibling threads.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With