How exactly does the x86 LOOP instruction work?

Question

            mov    ecx, 16
looptop:    .
            .
            .
            loop looptop

How many times will this loop execute?

What happens if ecx = 0 to start with? Does loop jump or fall-through in that case?

Peter Cordes · Accepted Answer

loop is exactly like dec ecx / jnz, except it doesn't set flags.

It's like the bottom of a do {} while(--ecx != 0); in C. If execution enters the loop with ecx = 0, wrap-around means the loop will run 2^32 times. (Or 2^64 times in 64-bit mode, because it uses RCX.)

Unlike rep movsb/stosb/etc., it doesn't check for ECX=0 before decrementing, only after¹.

The address-size determines whether it uses CX, ECX, or RCX. So in 64-bit code, addr32 loop is like dec ecx / jnz, while a regular loop is like dec rcx / jnz. Or in 16-bit code, it normally uses CX, but an address-size prefix (0x67) will make it use ecx. As Intel's manual says, it ignores REX.W, because that sets the operand-size, not the address-size.

rep string instructions use the address-size prefix the same way, overriding the address size but also RCX vs. ECX (or CX vs. ECX in modes other than 64-bit). The operand-size for string instructions is already used to determine movsw vs. movsd vs. movsq, and you want address/repeat size to be orthogonal to that. Having loop and jrcxz/jecxz follow that behaviour is just continuing the design intent from 8086 of loop being intended for use with string operations when a simple rep couldn't get the job done; see below.

Related: Why are loops always compiled into "do...while" style (tail jump)? for more about loop structure in asm, while() {} vs. do {} while() and how to lay them out.

Footnote 1: jcxz (or x86-64 jrcxz) was intended for use before the top of a do {} while style loop, to skip it if it should run 0 times. On modern CPUs test rcx, rcx / jz is more efficient.

Stephen Morse, architect of 8086, wrote about the intended uses of loop/jcxz with string instructions in that section of his book, The 8086 Primer, available for free on his web site: https://www.stevemorse.org/8086/index.html. See the "complex string instructions" subsection, starting at the bottom of page 71. (Or start reading from earlier in the chapter, the whole String Instructions section starts on page 66. But note @ecm's review of a few things the book seems to explain poorly or incorrectly.)

If you're wondering about the design intent of x86 instructions, you won't find a better source than this. That's separate from the best / most efficient way to use them, especially on modern x86, but very good intro for beginners into what you can do with asm instructions as building blocks.

Extra debugging tips

If you ever want to know the details on an instruction, check the manual: either Intel's official vol.2 PDF instruction set reference manual, or an html extract with each entry on a different page (http://felixcloutier.com/x86/). But note that the HTML leaves out the intro and appendices that have details on how to interpret stuff, like when it says "flags are set according to the result" for instructions like add.

And you can (and should) also just try stuff in a debugger: single-step and watch registers change. Use a smaller starting value for ecx so you get to the interesting ecx=1 part sooner. See also the x86 tag wiki for links to manuals, guides, and asm debugging tips at the bottom.

And BTW, if the instructions inside the loop that aren't shown modify ecx, it could loop any number of times. For the question to have a simple and unique answer, you need a guarantee that the instructions between the label and the loop instruction don't modify ecx. (They could save/restore it, but if you're going to do that it's usually better to just use a different register as the loop counter. push/pop inside a loop makes your code hard to read.)

Rant about over-use of LOOP even when you already need to increment something else in the loop. LOOP isn't the only way to loop, and usually it's the worst.

You should normally never use the loop instruction unless optimizing for code-size at the expense of speed, because it's slow. Compilers don't use it. (So CPU vendors don't bother to make it fast; catch 22.) Use dec / jnz, or an entirely different loop condition. (See also http://agner.org/optimize/ to learn more about what's efficient.)

Loops don't even have to use a counter; it's often just as good if not better to compare a pointer to an end address, or to check for some other condition. (Pointless use of loop is one of my pet peeves, especially when you already have something in another register that would work as a loop counter.) Using cx as a loop counter often just ties up one of your precious few registers when you could have used cmp/jcc on another register you were incrementing anyway.

IMO, loop should be considered one of those obscure x86 instructions that beginners shouldn't be distracted with. Like stosd (without a rep prefix), aam or xlatb. It does have real uses when optimizing for code size, though. (That's sometimes useful in real life for machine code (like for boot sectors), not just for stuff like code golf.)

IMO, just teach / learn how conditional branches work, and how to make loops out of them. Then you won't get stuck into thinking there's something special about a loop that uses loop. I've seen an SO question or comment that said something like "I thought you had to declare loops", and didn't realize that loop was just an instruction.

</rant>. Like I said, loop is one of my pet peeves. It's an obscure code-golfing instruction, unless you're optimizing for an actual 8086.

How exactly does the x86 LOOP instruction work?

Tags:

loops

x86

assembly

instructions

Hannah Duncan

1 Answers

Extra debugging tips

Peter Cordes

Recent Activity

Donate For Us

How exactly does the x86 LOOP instruction work?

Tags:

loops

x86

assembly

instructions

Hannah Duncan

1 Answers

Extra debugging tips

Peter Cordes

Related questions

Recent Activity

Donate For Us