I have following C snippet:
int main() {
int tablica [100];
bool visited [100];
int counter;
int i;
for(i=0;i<=99;i++) {
if (visited[i]==0) {
counter=counter+1;
}
}
}
which I converted to assembler. I received the following output:
; ...
mov eax, DWORD PTR [rbp-8]
cdqe
movzx eax, BYTE PTR [rbp-528+rax]
xor eax, 1
test al, al
je .L3
; ...
Could anybody explain to me what is the meaning and purpose of the CDQE
and MOVZX
instructions in this code? I also don't understand what the use of the XOR
instruction is.
all of rax. cltq stands for “convert long to quad” – a. “longword”is 32 bits, a quadword” is 64. d) Do a cqto instruction (with no operands) to sign-extend it to. rdx.
al and ah are the 8-bit, "char" size registers. al is the low 8 bits, ah is the high 8 bits. They're pretty similar to the old 8-bit registers of the 8008 back in 1972.
The CDQE
instruction sign-extends a DWORD (32-bit value) in the EAX
register to a QWORD (64-bit value) in the RAX
register.
The MOVZX
instruction zero-extends the source to the destination. In this case, it sign-extends the BYTE loaded from memory at [rbp-528+rax]
to the DWORD destination register, EAX
.
The XOR eax, 1
instruction just flips the lowest bit of EAX
. If it is currently set (1), then it becomes clear (0). If it is currently clear (0), then it becomes set (1).
What is the big picture? Well, it turns out that this is almost completely pointless code, the kind of output you get from a compiler without optimizations enabled. It serves very little purpose to try and analyze that.
But, if you want, we can analyze it anyway. Here is the entire assembly output for your C code, as generated by GCC 8.2 at -O0
, with each instruction annotated:
main():
push rbp ; \ standard function
mov rbp, rsp ; / prologue code
sub rsp, 408 ; allocate space for stack array
mov DWORD PTR [rbp-8], 0 ; i = 0
.L4:
cmp DWORD PTR [rbp-8], 99 ; is i <= 99?
jg .L2 ; jump to L2 if i > 99; otherwise fall through
mov eax, DWORD PTR [rbp-8] ; EAX = i
cdqe ; RAX = i
movzx eax, BYTE PTR [rbp-528+rax] ; EAX = visited[i]
xor eax, 1 ; flip low-order bit of EAX (EAX ^= 1)
test al, al ; test if low-order bit is set?
je .L3 ; jump to L3 if low-order bit is clear (== 0)
; (which means it was originally set (== 1),
; which means visited[i] != 0)
; otherwise (visited[i] == 0), fall through
add DWORD PTR [rbp-4], 1 ; counter += 1
.L3:
add DWORD PTR [rbp-8], 1 ; i += 1
jmp .L4 ; unconditionally jump to top of loop (L4)
.L2:
mov eax, 0 ; EAX = 0 (EAX is result of main function)
leave ; function epilogue
ret ; return
Neither an assembly programmer nor an optimizing compiler would produce this code. It makes extremely ineffective use of the registers (preferring to load and store to memory, including values like i
and the counter
, which are prime targets for storing in registers), and it has a lot of pointless instructions.
Of course, an optimizing compiler would really do a number on this code, eliding it completely since it has no observable side-effects. The output would just be:
main():
xor eax, eax ; main will return 0
ret
That's not so interesting to analyze, but far more efficient. That's why we pay our C compilers the big bucks.
The C code also has undefined behavior in these lines:
int counter;
/* ... */
counter=counter+1;
You never initialize counter
, but then you try to read from it. Since it is a variable with automatic storage duration, its contents are not automatically initialized, and reading from an uninitialized variable is undefined behavior. This justifies a C compiler emitting any assembly code that it wants.
Let's assume that counter
is initialized to 0, and we were to write this assembly code by hand, ignoring the possibility of eliding the whole mess. We'd get something like:
main():
mov edx, OFFSET visited ; EDX = &visited[0]
xor eax, eax ; EAX = 0
MainLoop:
cmp BYTE PTR [rdx], 1 ; \ EAX += (*RDX == 0) ? 1
adc eax, 0 ; / : 0
inc rdx ; RDX += 1
cmp rdx, OFFSET visited + 100 ; is *RDX == &visited[100]?
jne MainLoop ; if not, keep looping; otherwise, done
ret ; return, with result in EAX
What happened? Well, the calling convention says that EAX
always holds the return value, so I've put counter
in EAX
and assumed that we are returning counter
from the function. RDX
is a pointer tracking the current position in the visited
array. It gets incremented by 1 (size of a BYTE) throughout the MainLoop
. With that in mind, the rest of the code should be straightforward, except for the ADC
instruction.
This is an add-with-carry instruction, used to write the conditional if
inside of the loop branchlessly. An ADC
performs the following operation:
destination = (destination + source + CF)
where CF
is the carry flag. The CMP
instruction right before it sets the carry flag if visited[i] == 0
, and the source is 0
, so it does just what I commented to the right of the instruction: it adds 1 to EAX
(counter
) if *RDX == 0
(visited[i] == 0
); otherwise, it adds 0 (which is a no-op).
If you wanted to write branchy code, you'd do:
main():
mov edx, OFFSET visited ; EDX = &visited[0]
xor eax, eax ; EAX = 0
MainLoop:
cmp BYTE PTR [rdx], 0 ; (*RDX == 0)?
jne Skip ; if not, branch to Skip; if so, fall through
inc eax ; EAX += 1
Skip:
inc rdx ; RDX += 1
cmp rdx, OFFSET visited + 100 ; is *RDX == &visited[100]?
jne MainLoop ; if not, keep looping; otherwise, done
ret ; return, with result in EAX
This works just as well, but depending on how predictable the values of the visited
array are, may be slower due to branch prediction failure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With