I have some unknown C++ code that was compiled in Release build, so it's optimized. The point I'm struggling with is:
xor al, al
add esp, 8
cmp byte ptr [ebp+userinput], 31h
movzx eax, al
This is my understanding:
xor al, al ; set eax to 0x??????00 (clear last byte)
add esp, 8 ; for some unclear reason, set the stack pointer higher
cmp byte ptr [ebp+userinput], 31h ; set zero flag if user input was "1"
movzx eax, al ; set eax to AL and extend with zeros, so eax = 0x000000??
I don't care about line 2 and 3. They might be there in this order for pipelining reasons and IMHO have nothing to do with EAX.
However, I don't understand why I would clear AL first, just to clear the rest of EAX later. The result will IMHO always be EAX = 0
, so this could also be
xor eax, eax
instead. What is the advantage or "optimization" of that piece of code?
Some background info:
I will get the source code later. It's a short C++ console demo program, maybe 20 lines of code only, so nothing that I would call "complex" code. IDA shows a single loop in that program, but not around this piece. The Stud_PE signature scan didn't find anything, but likely it's Visual Studio 2013 or 2015 compiler.
The XOR EAX,EAX simply 0's out the EAX register, it executes faster than a MOV EAX,$0 and doesn't need to fetch immediate data of 0 to load into eax. It's very obvious this is the "return 0" that MSVC is optimizing EAX is the register used to return a value from a function in MSVC.
TL;DR summary: xor same, same is the best choice for all CPUs. No other method has any advantage over it, and it has at least some advantage over any other method. It's officially recommended by Intel and AMD, and what compilers do.
On modern CPUs the XOR pattern is preferred. It is smaller, and faster.
This means that any value XOR'd with zero is left unchanged. This means that any value XOR'd with itself gives zero.
xor al,al
is already slower than xor eax,eax
on most CPUs. e.g. on Haswell/Skylake it needs an ALU uop and doesn't break the dependency on the old value of eax
/rax
. It's equally bad on AMD CPUs, or Atom/Silvermont. (Well, maybe not equally because AMD doesn't eliminate xor eax,eax
at issue/rename, but it still has a false dependency which could serialize the new dependency chain with whatever used eax
last).
On CPUs that do rename al
separately from the rest of the register (Intel pre-IvyBridge), the xor al,al
may still be recognized as a zeroing idiom, but unless you actively want to preserve the upper bytes of the register, the best way to zero al
is xor eax,eax
.
Doing movzx
on top of that just makes it even worse.
I'm guessing your compiler somehow got confused and decided it needed a 1-byte zero, but then realized it needed to promote it to 32 bits. xor
sets flags, so it couldn't xor
-zero after the cmp
, and it failed to notice that it could have just xor-zeroed eax
before the cmp
.
Either that or it's something like Jester's suggestion, where the movzx
is a branch target. Even if that's the case, xor eax,eax
would still have been better because zero-extending into eax follows unconditionally on this code path.
I'm curious what compiler produced this from what source.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With