Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it efficient to use length() in loops?

I have a lot of cases when I iterate over a vector in MATLAB,

so I do this :

len = length(vector)
for i=1:len
 do_something();
end

But this is just an intuition that says "prevent calling the length() function every time". Is it true? Or is it the same thing as this, in terms of time requirement :

for i=1:length(vector)
 do_something();
end

Thanks for any help!

like image 898
jeff Avatar asked May 15 '14 12:05

jeff


3 Answers

If you're concerned about performance or speed - don't worry about it, really. It will not make any appreciable difference. @David gives some timings in his answer that back this up.

In terms of aesthetics, for what it's worth I would normally write

for i = 1:numel(v)
    doSomething(v(i))
end

I tend to prefer numel to length, as length gives you the length of longest array dimension, rather than the length of the first dimension. If you're always sure it's a vector they're the same, but I try to avoid that assumption.

Of course, if I needed access to numel(v) separately, then it would be worth extracting it to an intermediate variable vNum and writing for i = 1:vNum.

If you're interested in the semantics of the for loop indices, it's more complex than you think. When you write for i = 1:N, MATLAB does not always simply create the vector 1:N and then iterate through it. For example, you can write

for i = 1:realmax
    i
end

or

for i = 1:Inf
    i
end

or even

for i = Inf:-1:1
    i
end

and it will work (press Ctrl-C to escape). MATLAB doesn't create the loop indices as a vector in these case, and it couldn't, as it would be too large.

The parser has a more complex logic than that, with knowledge of several edge cases, and optimizations. Sometimes it will create the loop indices ahead of the loop, sometimes on the fly. I guarantee you will not be able to second-guess all the internals, unless you have access to the MATLAB source code.

Note also that the loop indices do not have to be a vector; if you supply an array, MATLAB loops through the columns. For example,

for i = magic(3)
    i
end

displays the columns of the array in turn. Similarly you can supply a cell array and it will iterate through the cells (NB the indices are the cells, not the elements within the cells).

For this reason, I would sometimes write

for vElement = v
    doSomething(vElement)
end

rather than the first pattern above using numel.

And of course, depending on the application, it may well be better to vectorize doSomething and call doSomething(v).

One last thing - we're talking so far about integer indices really, but remember that the colon operator can have any increment, such as pi:sqrt(2):10; and remember that the colon operator is not the same as linspace, and will not give you the same answers.

like image 135
Sam Roberts Avatar answered Oct 11 '22 05:10

Sam Roberts


I ran a test, first with JIT compilation on (I ran each test 3 times)

length(vector) 12.86 12.50 12.44    
N              13.00 12.52 12.55   
numel(vector)  12.83 12.55 12.56

and with JIT compilation off,

length(vector) 13.12 13.43 12.95   
N              12.54 13.04 13.00   
numel(vector)  12.57 12.92 12.72

I conclude that with JIT compilation on, it doesn't matter what you do, but without JIT then numel of N are best, but it doesn't seem to make a big difference. I can't explain this, and maybe this test is flawed, but there you go.

Here's the code:

clear
N=5e4;
A=rand(1e2);
tic()
vector=ones(1,N);
for i=1:length(vector)
    inv(A);
end
toc()
tic()
for i=1:N
    inv(A);
end
toc()
tic()
vector=ones(1,N);
for i=1:numel(vector)
    inv(A);
end
toc()
like image 1
David Avatar answered Oct 11 '22 04:10

David


They're the same. In the specific case of Matlab, don't worry about it: putting length() or any other function in the for initialization clause is always just as fast as evaluating it outside the loop, because for will only call it once either way. Your intuition is probably based on some other languages' for loops like C and Java, which have different behavior.

By definition, the Matlab for loop evaluates its argument expressions only once, at the beginning of the loop, to pre-compute a range or array of values for the loop index variable (i) to take on inside the loop passes. Unlike many other languages, Matlab's for does not re-evaluate some of the loop control statements each time through the loop. (This is also why assigning to the loop index variable inside the body of a Matlab for loop has no effect, where in C or Java it will allow you to "jump around" and alter the control flow.)

Have a read through the Matlab for documentation. It could stand to be more explicit about it, but you'll notice it's defined in terms of the values that the expressions resolve to, and not the expressions themselves.

Syntax equivalence

A C for loop is defined to have this behavior.

/* C-style for loop */
for ( A; B; C; ) {
    ...
}

/* basically equivalent to: */
{
    A;
    while ( B ) {
        ....
        C;
    }
}

Functionally, Matlab's for loop syntax equivalence is more like this.

% Matlab for loop

for i = A:B
    ...
end

% basically functionally equivalent to:

tmp_X = A:B;         % A and B only get evaluated outside the loop!
tmp_i = 1;           % tmp_* = an implicit variable you have no access to
while tmp_i < size(tmp_X,2)
    i = tmp_X(:,tmp_i);
    ...
    tmp_i = tmp_i + 1;
end

And in practice, Matlab can optimize away the creation of the concrete array tmp_X in the case of primitive values. This decoupling of the loop body from the control expressions also helps support the parallel parfor loop used with the Parallel Computing Toolbox, because the value of the loop index variable for every loop iteration is known before the loop starts, and independent of the execution of any of the loop passes.

Demonstration

You can confirm this behavior yourself by using a function that has an observable side effect in the loop control clause.

function show_for_behavior

for i = 1:three(NaN)
    disp(i);
end

function out = three(x)
disp('three() got called');
out = 3;

You can see there was only one invocation for the whole loop.

>> show_for_behavior
three() got called
     1
     2
     3

Language reasons

Here's where I speculate a bit.

Beyond convenience, I suspect one of the reasons Matlab defines its for loop the way it does, instead of providing you the C style syntactic sugar over a regular while loop, is that it's tricky to get the index variable right due to floating-point roundoff. By default, the numeric loop variables you're working with are doubles, and for large values of x (around 10^15), x + 1 == x, because the relative precision at x (eps(x)) is greater than 1.

So if you do the naive while-loop transformation of for i = A:B ... end like this, you'll have an infinite loop, because at each step, i = i + 1 will result in the same value of i due to rounding.

i = A;
while (i < B)
    ...
    i = i + 1;
end

To be able to perform loops over sequences of large values, you can compute the value range and number of steps, keep track of the loop index using a separate integer value, and construct the i value for each step using that counter and the step size, instead of incrementing a temporary variable on each pass. Something like this.

% original
for x = A:S:B; ...; end

% equivalent
nSteps = int64( ((B - A) / S) ) + int64(1);
i = int64(0);
while i < nSteps
    x = A + (S * double(i));
    ....
    i = i + int64(1);
end

You can only do this if the range min, max, and step are defined ahead of time for all passes, which isn't guaranteed with the more flexible while-loop form.

Note that in this case, for large A and B, x may have the exact same value for multiple iteration passes, but will eventually progress, and you'll get about as many loop iterations as you would expect if you were using infinite-precision values instead of approximate floating-point values. I suspect this is about what Matlab does internally in these cases.

Here's an example showing this behavior.

function big_for_loop(a)

if nargin < 1;  a = 1e20;  end
b = a + 4 * eps(a);
step = 15;

fprintf('b - a = %f\n', b - a);
fprintf('1 + (b - a) / step = %f\n', 1 + (b - a) / step);

last_i = a;
n = 0;
for i = a : step : b
    n = n + 1;
    if (i ~= last_i); disp('i advanced');  end
    last_i = i;
end
fprintf('niters = %d\n', n);

When I run this, i changes about as you would expect based on eps if this is how Matlab is doing the loop.

>> big_for_loop
b - a = 65536.000000
1 + (b - a) / step = 4370.066667
i advanced
i advanced
i advanced
i advanced
niters = 4370
like image 1
Andrew Janke Avatar answered Oct 11 '22 06:10

Andrew Janke