Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any advantage of XOR AL,AL + MOVZX EAX, AL over XOR EAX,EAX?

Tags:

c++

x86

assembly

I have some unknown C++ code that was compiled in Release build, so it's optimized. The point I'm struggling with is:

xor     al, al
add     esp, 8
cmp     byte ptr [ebp+userinput], 31h
movzx   eax, al

This is my understanding:

xor     al, al    ; set eax to 0x??????00 (clear last byte)
add     esp, 8    ; for some unclear reason, set the stack pointer higher
cmp     byte ptr [ebp+userinput], 31h ; set zero flag if user input was "1"
movzx   eax, al   ; set eax to AL and extend with zeros, so eax = 0x000000??

I don't care about line 2 and 3. They might be there in this order for pipelining reasons and IMHO have nothing to do with EAX.

However, I don't understand why I would clear AL first, just to clear the rest of EAX later. The result will IMHO always be EAX = 0, so this could also be

xor eax, eax

instead. What is the advantage or "optimization" of that piece of code?

Some background info:

I will get the source code later. It's a short C++ console demo program, maybe 20 lines of code only, so nothing that I would call "complex" code. IDA shows a single loop in that program, but not around this piece. The Stud_PE signature scan didn't find anything, but likely it's Visual Studio 2013 or 2015 compiler.

like image 663
Thomas Weller Avatar asked Nov 08 '17 21:11

Thomas Weller


People also ask

What is XOR EAX EAX?

The XOR EAX,EAX simply 0's out the EAX register, it executes faster than a MOV EAX,$0 and doesn't need to fetch immediate data of 0 to load into eax. It's very obvious this is the "return 0" that MSVC is optimizing EAX is the register used to return a value from a function in MSVC.

What is the fastest method to set a CPU register to zero?

TL;DR summary: xor same, same is the best choice for all CPUs. No other method has any advantage over it, and it has at least some advantage over any other method. It's officially recommended by Intel and AMD, and what compilers do.

Is XOR faster than MOV 0?

On modern CPUs the XOR pattern is preferred. It is smaller, and faster.

What happens when you XOR something with itself?

This means that any value XOR'd with zero is left unchanged. This means that any value XOR'd with itself gives zero.


1 Answers

xor al,al is already slower than xor eax,eax on most CPUs. e.g. on Haswell/Skylake it needs an ALU uop and doesn't break the dependency on the old value of eax/rax. It's equally bad on AMD CPUs, or Atom/Silvermont. (Well, maybe not equally because AMD doesn't eliminate xor eax,eax at issue/rename, but it still has a false dependency which could serialize the new dependency chain with whatever used eax last).

On CPUs that do rename al separately from the rest of the register (Intel pre-IvyBridge), the xor al,al may still be recognized as a zeroing idiom, but unless you actively want to preserve the upper bytes of the register, the best way to zero al is xor eax,eax.

Doing movzx on top of that just makes it even worse.


I'm guessing your compiler somehow got confused and decided it needed a 1-byte zero, but then realized it needed to promote it to 32 bits. xor sets flags, so it couldn't xor-zero after the cmp, and it failed to notice that it could have just xor-zeroed eax before the cmp.

Either that or it's something like Jester's suggestion, where the movzx is a branch target. Even if that's the case, xor eax,eax would still have been better because zero-extending into eax follows unconditionally on this code path.

I'm curious what compiler produced this from what source.

like image 152
Peter Cordes Avatar answered Sep 28 '22 07:09

Peter Cordes