parfor
is a convenient way to distribute independent iterations of intensive computations among several "workers". One meaningful restriction is that parfor
-loops cannot be nested, and invariably, that is the answer to similar questions like there and there.
Why parallelization across loop boundaries is so desirable
Consider the following piece of code where iterations take a highly variable amount of time on a machine that allows 4 workers. Both loops iterate over 6 values, clearly hard to share among 4.
for row = 1:6
parfor col = 1:6
somefun(row, col);
end
end
It seems like a good idea to choose the inner loop for parfor
because individual calls to somefun
are more variable than iterations of the outer loop. But what if the run time for each call to somefun
is very similar? What if there are trends in run time and we have three nested loops? These questions come up regularly, and people go to extremes.
Pattern needed for combining loops
Ideally, somefun
is run for all pairs of row
and col
, and workers should get busy irrespectively of which iterand is being varied. The solution should look like
parfor p = allpairs(1:6, 1:6)
somefun(p(1), p(2));
end
Unfortunately, even if I knew which builtin function creates a matrix with all combinations of row
and col
, MATLAB would complain with an error The range of a parfor statement must be a row vector. Yet, for
would not complain and nicely iterate over columns. An easy workaround would be to create that matrix and then index it with parfor
:
p = allpairs(1:6, 1:6);
parfor k = 1:size(pairs, 2)
row = p(k, 1);
col = p(k, 2);
somefun(row, col);
end
What is the builtin function in place of allpairs
that I am looking for? Is there a convenient idiomatic pattern that someone has come up with?
Any number of loops can be defined inside another loop, i.e., there is no restriction for defining any number of loops. The nesting level can be defined at n times. You can define any type of loop inside another loop; for example, you can define 'while' loop inside a 'for' loop. // inner loop statements.
A nested loop has one loop inside of another. These are typically used for working with two dimensions such as printing stars in rows and columns as shown below. When a loop is nested inside another loop, the inner loop runs many times inside the outer loop.
Parallelizing nested loops. If we have nested for loops, it is often enough to simply parallelize the outermost loop: a(); #pragma omp parallel for for (int i = 0; i < 4; ++i) { for (int j = 0; j < 4; ++j) { c(i, j); } } z(); This is all that we need most of the time.
A nested loop is a loop within a loop, an inner loop within the body of an outer one. How this works is that the first pass of the outer loop triggers the inner loop, which executes to completion. Then the second pass of the outer loop triggers the inner loop again. This repeats until the outer loop finishes.
MrAzzman already pointed out how to linearise nested loops. Here is a general solution to linearise n nested loops.
1) Assuming you have a simple nested loop structure like this:
%dummy function for demonstration purposes
f=@(a,b,c)([a,b,c]);
%three loops
X=cell(4,5,6);
for a=1:size(X,1);
for b=1:size(X,2);
for c=1:size(X,3);
X{a,b,c}=f(a,b,c);
end
end
end
2) Basic linearisation using a for loop:
%linearized conventional loop
X=cell(4,5,6);
iterations=size(X);
for ix=1:prod(iterations)
[a,b,c]=ind2sub(iterations,ix);
X{a,b,c}=f(a,b,c);
end
3) Linearisation using a parfor loop.
%linearized parfor loop
X=cell(4,5,6);
iterations=size(X);
parfor ix=1:prod(iterations)
[a,b,c]=ind2sub(iterations,ix);
X{ix}=f(a,b,c);
end
4) Using the second version with a conventional for loop, the order in which the iterations are executed is altered. If anything relies on this you have to reverse the order of the indices.
%linearized conventional loop
X=cell(4,5,6);
iterations=fliplr(size(X));
for ix=1:prod(iterations)
[c,b,a]=ind2sub(iterations,ix);
X{a,b,c}=f(a,b,c);
end
Reversing the order when using a parfor
loop is irrelevant. You can not rely on the order of execution at all. If you think it makes a difference, you can not use parfor
.
You should be able to do this with bsxfun
. I believe that bsxfun
will parallelise code where possible (see here for more information), in which case you should be able to do the following:
bsxfun(@somefun,(1:6)',1:6);
You would probably want to benchmark this though.
Alternatively, you could do something like the following:
function parfor_allpairs(fun, num_rows, num_cols)
parfor i=1:(num_rows*num_cols)
fun(mod(i-1,num_rows)+1,floor(i/num_cols)+1);
end
then call with:
parfor_allpairs(@somefun,6,6);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With