I have following C snippet: <pre class="prettyprint"><code>int main() { int tablica [100]; bool visited [100]; int counter; int i; for(i=0;i<=99;i++) { if (visited[i]==0) { counter=counter+1; } } } </code></pre> which I converted to assembler. I received the following output: <pre class="prettyprint"><code> ; ... mov eax, DWORD PTR [rbp-8] cdqe movzx eax, BYTE PTR [rbp-528+rax] xor eax, 1 test al, al je .L3 ; ... </code></pre> Could anybody explain to me what is the meaning and purpose of the <code>CDQE</code> and <code>MOVZX</code> instructions in this code? I also don't understand what the use of the <code>XOR</code> instruction is.

The <code>CDQE</code> instruction sign-extends a DWORD (32-bit value) in the <code>EAX</code> register to a QWORD (64-bit value) in the <code>RAX</code> register. The <code>MOVZX</code> instruction zero-extends the source to the destination. In this case, it sign-extends the BYTE loaded from memory at <code>[rbp-528+rax]</code> to the DWORD destination register, <code>EAX</code>. The <code>XOR eax, 1</code> instruction just flips the lowest bit of <code>EAX</code>. If it is currently set (1), then it becomes clear (0). If it is currently clear (0), then it becomes set (1). What is the big picture? Well, it turns out that this is almost completely pointless code, the kind of output you get from a compiler without optimizations enabled. It serves very little purpose to try and analyze that. But, if you want, we can analyze it anyway. Here is the entire assembly output for your C code, as generated by GCC 8.2 at <code>-O0</code>, with each instruction annotated: <pre class="prettyprint"><code>main(): push rbp ; \ standard function mov rbp, rsp ; / prologue code sub rsp, 408 ; allocate space for stack array mov DWORD PTR [rbp-8], 0 ; i = 0 .L4: cmp DWORD PTR [rbp-8], 99 ; is i <= 99? jg .L2 ; jump to L2 if i > 99; otherwise fall through mov eax, DWORD PTR [rbp-8] ; EAX = i cdqe ; RAX = i movzx eax, BYTE PTR [rbp-528+rax] ; EAX = visited[i] xor eax, 1 ; flip low-order bit of EAX (EAX ^= 1) test al, al ; test if low-order bit is set? je .L3 ; jump to L3 if low-order bit is clear (== 0) ; (which means it was originally set (== 1), ; which means visited[i] != 0) ; otherwise (visited[i] == 0), fall through add DWORD PTR [rbp-4], 1 ; counter += 1 .L3: add DWORD PTR [rbp-8], 1 ; i += 1 jmp .L4 ; unconditionally jump to top of loop (L4) .L2: mov eax, 0 ; EAX = 0 (EAX is result of main function) leave ; function epilogue ret ; return </code></pre> Neither an assembly programmer nor an optimizing compiler would produce this code. It makes extremely ineffective use of the registers (preferring to load and store to memory, including values like <code>i</code> and the <code>counter</code>, which are prime targets for storing in registers), and it has a lot of pointless instructions. Of course, an optimizing compiler would really do a number on this code, eliding it completely since it has no observable side-effects. The output would just be: <pre class="prettyprint"><code>main(): xor eax, eax ; main will return 0 ret </code></pre> That's not so interesting to analyze, but far more efficient. That's why we pay our C compilers the big bucks. The C code also has undefined behavior in these lines: <pre class="prettyprint"><code>int counter; /* ... */ counter=counter+1; </code></pre> You never initialize <code>counter</code>, but then you try to read from it. Since it is a variable with automatic storage duration, its contents are not automatically initialized, and reading from an uninitialized variable is undefined behavior. This justifies a C compiler emitting any assembly code that it wants. Let's assume that <code>counter</code> is initialized to 0, and we were to write this assembly code by hand, ignoring the possibility of eliding the whole mess. We'd get something like: <pre class="prettyprint"><code>main(): mov edx, OFFSET visited ; EDX = &visited[0] xor eax, eax ; EAX = 0 MainLoop: cmp BYTE PTR [rdx], 1 ; \ EAX += (*RDX == 0) ? 1 adc eax, 0 ; / : 0 inc rdx ; RDX += 1 cmp rdx, OFFSET visited + 100 ; is *RDX == &visited[100]? jne MainLoop ; if not, keep looping; otherwise, done ret ; return, with result in EAX </code></pre> What happened? Well, the calling convention says that <code>EAX</code> always holds the return value, so I've put <code>counter</code> in <code>EAX</code> and assumed that we are returning <code>counter</code> from the function. <code>RDX</code> is a pointer tracking the current position in the <code>visited</code> array. It gets incremented by 1 (size of a BYTE) throughout the <code>MainLoop</code>. With that in mind, the rest of the code should be straightforward, except for the <code>ADC</code> instruction. This is an add-with-carry instruction, used to write the conditional <code>if</code> inside of the loop branchlessly. An <code>ADC</code> performs the following operation: <pre class="prettyprint"><code>destination = (destination + source + CF) </code></pre> where <code>CF</code> is the carry flag. The <code>CMP</code> instruction right before it sets the carry flag if <code>visited[i] == 0</code>, and the source is <code>0</code>, so it does just what I commented to the right of the instruction: it adds 1 to <code>EAX</code> (<code>counter</code>) if <code>*RDX == 0</code> (<code>visited[i] == 0</code>); otherwise, it adds 0 (which is a no-op). If you wanted to write branchy code, you'd do: <pre class="prettyprint"><code>main(): mov edx, OFFSET visited ; EDX = &visited[0] xor eax, eax ; EAX = 0 MainLoop: cmp BYTE PTR [rdx], 0 ; (*RDX == 0)? jne Skip ; if not, branch to Skip; if so, fall through inc eax ; EAX += 1 Skip: inc rdx ; RDX += 1 cmp rdx, OFFSET visited + 100 ; is *RDX == &visited[100]? jne MainLoop ; if not, keep looping; otherwise, done ret ; return, with result in EAX </code></pre> This works just as well, but depending on how predictable the values of the <code>visited</code> array are, may be slower due to branch prediction failure.

What is the meaning/use of the MOVZX, CDQE instructions in this code output by a C compiler?

I have following C snippet:

int main() {

    int tablica [100];
    bool visited [100];
    int counter;
    int i;

    for(i=0;i<=99;i++) {
        if (visited[i]==0) {
            counter=counter+1;
        }
    }

}

which I converted to assembler. I received the following output:

   ; ...

    mov     eax, DWORD PTR [rbp-8]
    cdqe
    movzx   eax, BYTE PTR [rbp-528+rax]
    xor     eax, 1
    test    al, al
    je      .L3

    ; ...

Could anybody explain to me what is the meaning and purpose of the CDQE and MOVZX instructions in this code? I also don't understand what the use of the XOR instruction is.

What does CLTQ mean in assembly?

all of rax. cltq stands for “convert long to quad” – a. “longword”is 32 bits, a quadword” is 64. d) Do a cqto instruction (with no operands) to sign-extend it to. rdx.

What is the AL register?

al and ah are the 8-bit, "char" size registers. al is the low 8 bits, ah is the high 8 bits. They're pretty similar to the old 8-bit registers of the 8008 back in 1972.

The CDQE instruction sign-extends a DWORD (32-bit value) in the EAX register to a QWORD (64-bit value) in the RAX register.

The MOVZX instruction zero-extends the source to the destination. In this case, it sign-extends the BYTE loaded from memory at [rbp-528+rax] to the DWORD destination register, EAX.

The XOR eax, 1 instruction just flips the lowest bit of EAX. If it is currently set (1), then it becomes clear (0). If it is currently clear (0), then it becomes set (1).

What is the big picture? Well, it turns out that this is almost completely pointless code, the kind of output you get from a compiler without optimizations enabled. It serves very little purpose to try and analyze that.

But, if you want, we can analyze it anyway. Here is the entire assembly output for your C code, as generated by GCC 8.2 at -O0, with each instruction annotated:

main():
        push    rbp                         ; \ standard function
        mov     rbp, rsp                    ; /  prologue code
        sub     rsp, 408                    ; allocate space for stack array
        mov     DWORD PTR [rbp-8], 0        ; i = 0
.L4:
        cmp     DWORD PTR [rbp-8], 99       ; is i <= 99?
        jg      .L2                         ; jump to L2 if i > 99; otherwise fall through
        mov     eax, DWORD PTR [rbp-8]      ; EAX = i
        cdqe                                ; RAX = i
        movzx   eax, BYTE PTR [rbp-528+rax] ; EAX = visited[i]
        xor     eax, 1                      ; flip low-order bit of EAX (EAX ^= 1)
        test    al, al                      ; test if low-order bit is set?
        je      .L3                         ; jump to L3 if low-order bit is clear (== 0)
                                            ;  (which means it was originally set (== 1),
                                            ;   which means visited[i] != 0)
                                            ; otherwise (visited[i] == 0), fall through
        add     DWORD PTR [rbp-4], 1        ; counter += 1
.L3:
        add     DWORD PTR [rbp-8], 1        ; i += 1
        jmp     .L4                         ; unconditionally jump to top of loop (L4)
.L2:
        mov     eax, 0                      ; EAX = 0 (EAX is result of main function)
        leave                               ; function epilogue
        ret                                 ; return

Neither an assembly programmer nor an optimizing compiler would produce this code. It makes extremely ineffective use of the registers (preferring to load and store to memory, including values like i and the counter, which are prime targets for storing in registers), and it has a lot of pointless instructions.

Of course, an optimizing compiler would really do a number on this code, eliding it completely since it has no observable side-effects. The output would just be:

main():
        xor     eax, eax    ; main will return 0
        ret

That's not so interesting to analyze, but far more efficient. That's why we pay our C compilers the big bucks.

The C code also has undefined behavior in these lines:

int counter;
/* ... */
counter=counter+1;

You never initialize counter, but then you try to read from it. Since it is a variable with automatic storage duration, its contents are not automatically initialized, and reading from an uninitialized variable is undefined behavior. This justifies a C compiler emitting any assembly code that it wants.

Let's assume that counter is initialized to 0, and we were to write this assembly code by hand, ignoring the possibility of eliding the whole mess. We'd get something like:

main():
        mov     edx, OFFSET visited             ; EDX = &visited[0]
        xor     eax, eax                        ; EAX = 0
MainLoop:
        cmp     BYTE PTR [rdx], 1               ; \ EAX += (*RDX == 0) ? 1
        adc     eax, 0                          ; /                    : 0
        inc     rdx                             ; RDX += 1
        cmp     rdx, OFFSET visited + 100       ; is *RDX == &visited[100]?
        jne     MainLoop                        ; if not, keep looping; otherwise, done
        ret                                     ; return, with result in EAX

What happened? Well, the calling convention says that EAX always holds the return value, so I've put counter in EAX and assumed that we are returning counter from the function. RDX is a pointer tracking the current position in the visited array. It gets incremented by 1 (size of a BYTE) throughout the MainLoop. With that in mind, the rest of the code should be straightforward, except for the ADC instruction.

This is an add-with-carry instruction, used to write the conditional if inside of the loop branchlessly. An ADC performs the following operation:

destination = (destination + source + CF)

where CF is the carry flag. The CMP instruction right before it sets the carry flag if visited[i] == 0, and the source is 0, so it does just what I commented to the right of the instruction: it adds 1 to EAX (counter) if *RDX == 0 (visited[i] == 0); otherwise, it adds 0 (which is a no-op).

If you wanted to write branchy code, you'd do:

main():
        mov     edx, OFFSET visited             ; EDX = &visited[0]
        xor     eax, eax                        ; EAX = 0
MainLoop:
        cmp     BYTE PTR [rdx], 0               ; (*RDX == 0)?
        jne     Skip                            ; if not, branch to Skip; if so, fall through
        inc     eax                             ; EAX += 1
Skip:
        inc     rdx                             ; RDX += 1
        cmp     rdx, OFFSET visited + 100       ; is *RDX == &visited[100]?
        jne     MainLoop                        ; if not, keep looping; otherwise, done
        ret                                     ; return, with result in EAX

This works just as well, but depending on how predictable the values of the visited array are, may be slower due to branch prediction failure.

What is the meaning/use of the MOVZX, CDQE instructions in this code output by a C compiler?

Tags:

x86

assembly

x86-64

64-bit

BartekS

People also ask

1 Answers

Cody Gray

Recent Activity

Donate For Us

What is the meaning/use of the MOVZX, CDQE instructions in this code output by a C compiler?

Tags:

x86

assembly

x86-64

64-bit

BartekS

People also ask

1 Answers

Cody Gray

Related questions

Recent Activity

Donate For Us