mov ecx, 16
looptop: .
.
.
loop looptop
How many times will this loop execute?
What happens if ecx = 0
to start with? Does loop
jump or fall-through in that case?
loop
is exactly like dec ecx / jnz
, except it doesn't set flags.
It's like the bottom of a do {} while(--ecx != 0);
in C. If execution enters the loop with ecx = 0
, wrap-around means the loop will run 2^32 times. (Or 2^64 times in 64-bit mode, because it uses RCX.)
Unlike rep movsb/stosb/etc.
, it doesn't check for ECX=0 before decrementing, only after1.
The address-size determines whether it uses CX, ECX, or RCX. So in 64-bit code, addr32 loop
is like dec ecx / jnz
, while a regular loop
is like dec rcx / jnz
. Or in 16-bit code, it normally uses CX, but an address-size prefix (0x67
) will make it use ecx
. As Intel's manual says, it ignores REX.W, because that sets the operand-size, not the address-size.
rep
string instructions use the address-size prefix the same way, overriding the address size but also RCX vs. ECX (or CX vs. ECX in modes other than 64-bit). The operand-size for string instructions is already used to determine movsw
vs. movsd
vs. movsq
, and you want address/repeat size to be orthogonal to that. Having loop
and jrcxz
/jecxz
follow that behaviour is just continuing the design intent from 8086 of loop
being intended for use with string operations when a simple rep
couldn't get the job done; see below.
Related: Why are loops always compiled into "do...while" style (tail jump)? for more about loop structure in asm, while() {}
vs. do {} while()
and how to lay them out.
Footnote 1: jcxz
(or x86-64 jrcxz
) was intended for use before the top of a do {} while
style loop, to skip it if it should run 0 times. On modern CPUs test rcx, rcx
/ jz
is more efficient.
Stephen Morse, architect of 8086, wrote about the intended uses of loop
/jcxz
with string instructions in that section of his book, The 8086 Primer, available for free on his web site: https://www.stevemorse.org/8086/index.html. See the "complex string instructions" subsection, starting at the bottom of page 71. (Or start reading from earlier in the chapter, the whole String Instructions section starts on page 66. But note @ecm's review of a few things the book seems to explain poorly or incorrectly.)
If you're wondering about the design intent of x86 instructions, you won't find a better source than this. That's separate from the best / most efficient way to use them, especially on modern x86, but very good intro for beginners into what you can do with asm instructions as building blocks.
If you ever want to know the details on an instruction, check the manual: either Intel's official vol.2 PDF instruction set reference manual, or an html extract with each entry on a different page (http://felixcloutier.com/x86/). But note that the HTML leaves out the intro and appendices that have details on how to interpret stuff, like when it says "flags are set according to the result" for instructions like add
.
And you can (and should) also just try stuff in a debugger: single-step and watch registers change. Use a smaller starting value for ecx
so you get to the interesting ecx=1
part sooner. See also the x86 tag wiki for links to manuals, guides, and asm debugging tips at the bottom.
And BTW, if the instructions inside the loop that aren't shown modify ecx
, it could loop any number of times. For the question to have a simple and unique answer, you need a guarantee that the instructions between the label and the loop
instruction don't modify ecx
. (They could save/restore it, but if you're going to do that it's usually better to just use a different register as the loop counter. push
/pop
inside a loop makes your code hard to read.)
Rant about over-use of LOOP
even when you already need to increment something else in the loop. LOOP
isn't the only way to loop, and usually it's the worst.
You should normally never use the loop instruction unless optimizing for code-size at the expense of speed, because it's slow. Compilers don't use it. (So CPU vendors don't bother to make it fast; catch 22.) Use dec / jnz
, or an entirely different loop condition. (See also http://agner.org/optimize/ to learn more about what's efficient.)
Loops don't even have to use a counter; it's often just as good if not better to compare a pointer to an end address, or to check for some other condition. (Pointless use of loop
is one of my pet peeves, especially when you already have something in another register that would work as a loop counter.) Using cx
as a loop counter often just ties up one of your precious few registers when you could have used cmp
/jcc
on another register you were incrementing anyway.
IMO, loop
should be considered one of those obscure x86 instructions that beginners shouldn't be distracted with. Like stosd
(without a rep
prefix), aam
or xlatb
. It does have real uses when optimizing for code size, though. (That's sometimes useful in real life for machine code (like for boot sectors), not just for stuff like code golf.)
IMO, just teach / learn how conditional branches work, and how to make loops out of them. Then you won't get stuck into thinking there's something special about a loop that uses loop
. I've seen an SO question or comment that said something like "I thought you had to declare loops", and didn't realize that loop
was just an instruction.
</rant>
. Like I said, loop
is one of my pet peeves. It's an obscure code-golfing instruction, unless you're optimizing for an actual 8086.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With