I am writing an image processing filter, and I want to speed up the computations using openmp. My pseudo-code structure follows like this:
for(every pixel in the image){
//do some stuff here
for(any combination of parameters){
//do other stuff here and filter
}
}
The code is filtering every pixel using different parameters, and choosing the optimal ones.
My question is what is faster: to parallelize the first loop among the processors, or to access sequentially the pixels and parallelize the different parameters selection.
I think the question could be a more general one: what is faster, giving big amounts of operations to every thread, or creating many threads with few operations.
I don't care for now about the implementation details, and I think I can handle them with my previous expertise using openmp. Thanks!
Your goal is to distribute the data evenly over the available processors. You should split the image up (outer loop) evenly with one thread per processor core. Experiment with fine and coarse grain parallelism to see what gives the best results. Once your number of threads exceed the number of cores available you will start to see performance degradation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With