Which loops should I parallelize, the outer or the inner ones

Question

I am writing an image processing filter, and I want to speed up the computations using openmp. My pseudo-code structure follows like this:

for(every pixel in the image){
    //do some stuff here
    for(any combination of parameters){
        //do other stuff here and filter
    }
}

The code is filtering every pixel using different parameters, and choosing the optimal ones.

My question is what is faster: to parallelize the first loop among the processors, or to access sequentially the pixels and parallelize the different parameters selection.

I think the question could be a more general one: what is faster, giving big amounts of operations to every thread, or creating many threads with few operations.

I don't care for now about the implementation details, and I think I can handle them with my previous expertise using openmp. Thanks!

Eamonn McEvoy · Accepted Answer

Your goal is to distribute the data evenly over the available processors. You should split the image up (outer loop) evenly with one thread per processor core. Experiment with fine and coarse grain parallelism to see what gives the best results. Once your number of threads exceed the number of cores available you will start to see performance degradation.

Which loops should I parallelize, the outer or the inner ones

Tags:

c++

c

multithreading

parallel-processing

openmp

Anthony

1 Answers

Eamonn McEvoy

Recent Activity

Donate For Us

Which loops should I parallelize, the outer or the inner ones

Tags:

c++

c

multithreading

parallel-processing

openmp

Anthony

1 Answers

Eamonn McEvoy

Related questions

Recent Activity

Donate For Us