Given N
elements, process only the first (0) and last (N-1) element.
But, if N = 1
, only process the single element once.
Using a loop that runs once or twice, as appropriate, lets us avoid duplicating the loop body. If there's a readable way to do this, it has a benefit for source-code size. It may also have advantages for machine-code size, if the loop body is large, and the compiler doesn't end up duplicating it.
I tried incrementing by N-1
but it will not work when N=1
(loops forever). Are there tricks (reverse loop f.i) that will fix this?
for (i = 0 ; i < N ; i += (N - 1))
My original problem concerns three nested loops in x,y,z direction, which is why I couldn't just process elem[0]) and elem[N-1]. Now I have the following
#define forEachLglBound(i_,j_,k_) \
for(Int i_ = 0;i_ < NPX;i_+=((NPX>1) ? (NPX-1) : 1)) \
for(Int j_ = 0;j_ < NPY;j_+=((NPY>1) ? (NPY-1) : 1)) \
for(Int k_ = 0;k_ < NPZ;k_+=((NPZ>1) ? (NPZ-1) : 1))
How about the following line. Very close (textwise!) to the original solution.
for (i = 0 ; i < N ; i += (N > 1) ? N-1 : 1)
If you can factor the loop body out into a function or macro, probably the most readable way is:
process( arr[0] );
if (N-1 != 0)
process( arr[N-1] );
If that's not an option, or you think one of these loop structures is readable enough:
XOR can toggle a variable between 0 and the value you XOR with.
int i=0;
do {
process(arr[i]);
} while(i ^= (N-1)); // xor to toggle, but stays zero if N-1 is zero
This compiles to better asm than any of the other options, as you can see on godbolt. I included the ideas from the other answers, including the simple if
(which does very well when duplicating the operation is cheap).
The xor version keeps working very well in a triple-nested loop (see the while_xor_triple
function at the bottom of the godbolt link).
Without using xor, I think this loop is moderately readable:
int i=0;
do {
process( arr[i] );
} while( (i!=N-1) && (i=N-1) );
if i
isn't already N-1
, set it to that and redo the loop. This should make it easy for the compiler to do the test efficiently, without having to compute any expressions other than N-1. gcc still tests and branches on both operands of the &&
separately, though, but with less overhead than some of the other answers.
The 2nd while loop could compile to something like this (but unfortunately doesn't, because gcc doesn't do the trickery of my first two insns):
xor edx, edx
.repeat:
mov ecx, edx
... loop body using i (ecx)
mov edx, [N]
dec edx ; edx = N-1
cmp ecx, edx
jne .repeat
I thought of the XOR idea while thinking about how to do it in asm. IDK if a smart compiler could make this output from the while(i!=N-1 && i=N-1)
version, but it's definitely better:
xor ecx, ecx
.repeat:
... loop body using i (ecx)
mov edx, [N]
dec edx ; edx = N-1
xor ecx, edx ; i ^= N-1
jnz .repeat ; jump if the last result wasn't zero
If you have N-1
stored in a register or memory, the mov/dec
to compute it on the fly go away.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With