Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the meaning/use of the MOVZX, CDQE instructions in this code output by a C compiler?

I have following C snippet:

int main() {

    int tablica [100];
    bool visited [100];
    int counter;
    int i;

    for(i=0;i<=99;i++) {
        if (visited[i]==0) {
            counter=counter+1;
        }
    }

}

which I converted to assembler. I received the following output:

   ; ...

    mov     eax, DWORD PTR [rbp-8]
    cdqe
    movzx   eax, BYTE PTR [rbp-528+rax]
    xor     eax, 1
    test    al, al
    je      .L3

    ; ...

Could anybody explain to me what is the meaning and purpose of the CDQE and MOVZX instructions in this code? I also don't understand what the use of the XOR instruction is.

like image 544
BartekS Avatar asked Feb 10 '19 16:02

BartekS


People also ask

What does CLTQ mean in assembly?

all of rax. cltq stands for “convert long to quad” – a. “longword”is 32 bits, a quadword” is 64. d) Do a cqto instruction (with no operands) to sign-extend it to. rdx.

What is the AL register?

al and ah are the 8-bit, "char" size registers. al is the low 8 bits, ah is the high 8 bits. They're pretty similar to the old 8-bit registers of the 8008 back in 1972.


1 Answers

The CDQE instruction sign-extends a DWORD (32-bit value) in the EAX register to a QWORD (64-bit value) in the RAX register.

The MOVZX instruction zero-extends the source to the destination. In this case, it sign-extends the BYTE loaded from memory at [rbp-528+rax] to the DWORD destination register, EAX.

The XOR eax, 1 instruction just flips the lowest bit of EAX. If it is currently set (1), then it becomes clear (0). If it is currently clear (0), then it becomes set (1).

What is the big picture? Well, it turns out that this is almost completely pointless code, the kind of output you get from a compiler without optimizations enabled. It serves very little purpose to try and analyze that.

But, if you want, we can analyze it anyway. Here is the entire assembly output for your C code, as generated by GCC 8.2 at -O0, with each instruction annotated:

main():
        push    rbp                         ; \ standard function
        mov     rbp, rsp                    ; /  prologue code
        sub     rsp, 408                    ; allocate space for stack array
        mov     DWORD PTR [rbp-8], 0        ; i = 0
.L4:
        cmp     DWORD PTR [rbp-8], 99       ; is i <= 99?
        jg      .L2                         ; jump to L2 if i > 99; otherwise fall through
        mov     eax, DWORD PTR [rbp-8]      ; EAX = i
        cdqe                                ; RAX = i
        movzx   eax, BYTE PTR [rbp-528+rax] ; EAX = visited[i]
        xor     eax, 1                      ; flip low-order bit of EAX (EAX ^= 1)
        test    al, al                      ; test if low-order bit is set?
        je      .L3                         ; jump to L3 if low-order bit is clear (== 0)
                                            ;  (which means it was originally set (== 1),
                                            ;   which means visited[i] != 0)
                                            ; otherwise (visited[i] == 0), fall through
        add     DWORD PTR [rbp-4], 1        ; counter += 1
.L3:
        add     DWORD PTR [rbp-8], 1        ; i += 1
        jmp     .L4                         ; unconditionally jump to top of loop (L4)
.L2:
        mov     eax, 0                      ; EAX = 0 (EAX is result of main function)
        leave                               ; function epilogue
        ret                                 ; return

Neither an assembly programmer nor an optimizing compiler would produce this code. It makes extremely ineffective use of the registers (preferring to load and store to memory, including values like i and the counter, which are prime targets for storing in registers), and it has a lot of pointless instructions.

Of course, an optimizing compiler would really do a number on this code, eliding it completely since it has no observable side-effects. The output would just be:

main():
        xor     eax, eax    ; main will return 0
        ret

That's not so interesting to analyze, but far more efficient. That's why we pay our C compilers the big bucks.

The C code also has undefined behavior in these lines:

int counter;
/* ... */
counter=counter+1;

You never initialize counter, but then you try to read from it. Since it is a variable with automatic storage duration, its contents are not automatically initialized, and reading from an uninitialized variable is undefined behavior. This justifies a C compiler emitting any assembly code that it wants.

Let's assume that counter is initialized to 0, and we were to write this assembly code by hand, ignoring the possibility of eliding the whole mess. We'd get something like:

main():
        mov     edx, OFFSET visited             ; EDX = &visited[0]
        xor     eax, eax                        ; EAX = 0
MainLoop:
        cmp     BYTE PTR [rdx], 1               ; \ EAX += (*RDX == 0) ? 1
        adc     eax, 0                          ; /                    : 0
        inc     rdx                             ; RDX += 1
        cmp     rdx, OFFSET visited + 100       ; is *RDX == &visited[100]?
        jne     MainLoop                        ; if not, keep looping; otherwise, done
        ret                                     ; return, with result in EAX

What happened? Well, the calling convention says that EAX always holds the return value, so I've put counter in EAX and assumed that we are returning counter from the function. RDX is a pointer tracking the current position in the visited array. It gets incremented by 1 (size of a BYTE) throughout the MainLoop. With that in mind, the rest of the code should be straightforward, except for the ADC instruction.

This is an add-with-carry instruction, used to write the conditional if inside of the loop branchlessly. An ADC performs the following operation:

destination = (destination + source + CF)

where CF is the carry flag. The CMP instruction right before it sets the carry flag if visited[i] == 0, and the source is 0, so it does just what I commented to the right of the instruction: it adds 1 to EAX (counter) if *RDX == 0 (visited[i] == 0); otherwise, it adds 0 (which is a no-op).

If you wanted to write branchy code, you'd do:

main():
        mov     edx, OFFSET visited             ; EDX = &visited[0]
        xor     eax, eax                        ; EAX = 0
MainLoop:
        cmp     BYTE PTR [rdx], 0               ; (*RDX == 0)?
        jne     Skip                            ; if not, branch to Skip; if so, fall through
        inc     eax                             ; EAX += 1
Skip:
        inc     rdx                             ; RDX += 1
        cmp     rdx, OFFSET visited + 100       ; is *RDX == &visited[100]?
        jne     MainLoop                        ; if not, keep looping; otherwise, done
        ret                                     ; return, with result in EAX

This works just as well, but depending on how predictable the values of the visited array are, may be slower due to branch prediction failure.

like image 87
Cody Gray Avatar answered Oct 20 '22 19:10

Cody Gray