Say you have a loop containing a varying number of iterations and 4 cores
I understand that
#pragma omp parallel for
will basically divide the iterations in like this with chunks of size/4 length
| T1 | T2 | T3 | T4 |
However, in my particular situation, this behavior would be more advantageous. Where each chunk is size/size length. So thread 1 would not get iterations 0..size/4, but instead iterations 0,size/4,2*size/4,3*size/4
|T1|T2|T3|T4|T1|T2|T3|T4|T1|T2|T3|T4|T1|T2|T3|T4|
How can I have my code execute like this when the number of iterations is not known until runtime?
What you are describing -- assuming that your heuristic is size/total threads -- is a round-robin scheduling (i.e., static scheduling) with chunk_size = 1. For that you simply need :
#pragma omp parallel for schedule(static,1)
In this case, it makes no difference if the number of iterations is known (or not) at runtime.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With