Sometimes compilers generate code with weird instruction duplications that can safely be removed. Consider the following piece of code:
int gcd(unsigned x, unsigned y) { return x == 0 ? y : gcd(y % x, x); }
Here is the assembly code (generated by clang 5.0 with optimizations enabled):
gcd(unsigned int, unsigned int): # @gcd(unsigned int, unsigned int) mov eax, esi mov edx, edi test edx, edx je .LBB0_1 .LBB0_2: # =>This Inner Loop Header: Depth=1 mov ecx, edx xor edx, edx div ecx test edx, edx mov eax, ecx jne .LBB0_2 mov eax, ecx ret .LBB0_1: ret
In the following snippet:
mov eax, ecx jne .LBB0_2 mov eax, ecx
If the jump doesn't happen, eax
is reassigned for no obvious reason.
The other example is two ret's at the end of the function: one would perfectly work as well.
Is the compiler simply not intelligent enough or there's a reason to not remove the duplications?
Compilers can perform optimisations that are not obvious to people and removing instructions does not always make things faster.
A small amount of searching shows that various AMD processors have branch prediction problems when a RET is immediately after a conditional branch. By filling that slot with what is essentially a no-op, the performance problem is avoided.
Update:
Example reference, section 6.2 of the "Software Optimization Guide for AMD64 Processors" (see http://support.amd.com/TechDocs/25112.PDF) says:
Specifically, avoid the following two situations:
Any kind of branch (either conditional or unconditional) that has the single-byte near-return RET instruction as its target. See “Examples.”
A conditional branch that occurs in the code directly before the single-byte near-return RET instruction.
It also goes into detail on why jump targets should have alignment which is also likely to explain the duplicate RETs at the end of the function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With