Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this function push RAX to the stack as the first operation?

In the assembly of the C++ source below. Why is RAX pushed to the stack?

RAX, as I understand it from the ABI could contain anything from the calling function. But we save it here, and then later move the stack back by 8 bytes. So the RAX on the stack is, I think only relevant for the std::__throw_bad_function_call() operation ... ?

The code:-

#include <functional>   void f(std::function<void()> a)  {   a();  } 

Output, from gcc.godbolt.org, using Clang 3.7.1 -O3:

f(std::function<void ()>):                  # @f(std::function<void ()>)         push    rax         cmp     qword ptr [rdi + 16], 0         je      .LBB0_1         add     rsp, 8         jmp     qword ptr [rdi + 24]    # TAILCALL .LBB0_1:         call    std::__throw_bad_function_call() 

I'm sure the reason is obvious, but I'm struggling to figure it out.

Here's a tailcall without the std::function<void()> wrapper for comparison:

void g(void(*a)()) {   a();  } 

The trivial:

g(void (*)()):             # @g(void (*)())         jmp     rdi        # TAILCALL 
like image 814
JCx Avatar asked Jun 12 '16 11:06

JCx


People also ask

How does push and pop work in stack?

"pop" retrieves the last value pushed from the stack. Everything you push, you MUST pop again at some point afterwards, or your code will crash almost immediately!

What does a push instruction do?

The PUSH instruction saves the current PRINT, USING, or ACONTROL status in push-down storage on a last-in, first-out basis. You restore this PRINT, USING, or ACONTROL status later, also on a last-in, first-out basis, by using a POP instruction.

What does it mean to push a register?

"push" stores a constant or 64-bit register out onto the stack. The 64-bit registers are the ones like "rax" or "r8", not the 32-bit registers like "eax" or "r8d".

What is Rax used for?

The RAX register is used for return values in functions regardless of whether you're working with Objective-C or Swift.


2 Answers

The 64-bit ABI requires that the stack is aligned to 16 bytes before a call instruction.

call pushes an 8-byte return address on the stack, which breaks the alignment, so the compiler needs to do something to align the stack again to a multiple of 16 before the next call.

(The ABI design choice of requiring alignment before a call instead of after has the minor advantage that if any args were passed on the stack, this choice makes the first arg 16B-aligned.)

Pushing a don't-care value works well, and can be more efficient than sub rsp, 8 on CPUs with a stack engine. (See the comments).

like image 181
BeniBela Avatar answered Sep 24 '22 02:09

BeniBela


The reason push rax is there is to align the stack back to a 16-byte boundary to conform to the 64-bit System V ABI in the case where je .LBB0_1 branch is taken. The value placed on the stack isn't relevant. Another way would have been subtracting 8 from RSP with sub rsp, 8. The ABI states the alignment this way:

The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed on stack) byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 (32) when control is transferred to the function entry point. The stack pointer, %rsp, always points to the end of the latest allocated stack frame.

Prior to the call to function f the stack was 16-byte aligned per the calling convention. After control was transferred via a CALL to f the return address was placed on the stack misaligning the stack by 8. push rax is a simple way of subtracting 8 from RSP and realigning it again. If the branch is taken to call std::__throw_bad_function_call()the stack will be properly aligned for that call to work.

In the case where the comparison falls through, the stack will appear just as it did at function entry once the add rsp, 8 instruction is executed. The return address of the CALLER to function f will now be back at the top of the stack and the stack will be misaligned by 8 again. This is what we want because a TAIL CALL is being made with jmp qword ptr [rdi + 24] to transfer control to the function a. This will JMP to the function not CALL it. When function a does a RET it will return directly back to the function that called f.

At a higher optimization level I would have expected that the compiler should be smart enough to do the comparison, and let it fall through directly to the JMP. What is at label .LBB0_1 could then align the stack to a 16-byte boundary so that call std::__throw_bad_function_call() works properly.


As @CodyGray pointed out, if you use GCC (not CLANG) with optimization level of -O2 or higher, the code produced does seem more reasonable. GCC 6.1 output from Godbolt is:

f(std::function<void ()>):         cmp     QWORD PTR [rdi+16], 0     # MEM[(bool (*<T5fc5>) (union _Any_data &, const union _Any_data &, _Manager_operation) *)a_2(D) + 16B],         je      .L7 #,         jmp     [QWORD PTR [rdi+24]]      # MEM[(const struct function *)a_2(D)]._M_invoker .L7:         sub     rsp, 8    #,         call    std::__throw_bad_function_call()        # 

This code is more in line with what I would have expected. In this case it would appear that GCC's optimizer may handle this code generation better than CLANG.

like image 33
Michael Petch Avatar answered Sep 22 '22 02:09

Michael Petch