Motivation for useless prologue in gcc-compiled main(), disabling it?

Question

Given the following minimal test case:

void exit(int);

int main() { 
    exit(0);
}

GCC 4.9 and later with 32-bit x86 target produces something like:

main:
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ecx
        subl    $4, %esp
        subl    $12, %esp
        pushl   $0
        call    exit

Note the convoluted stack-realignment code. With the function renamed to anything but main, however, it gives the (much more reasonable):

xmain:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        subl    $12, %esp
        pushl   $0
        call    exit

The differences are even more pronounced with -O. As main nothing changes; renamed, it yields:

xmain:
        subl    $24, %esp
        pushl   $0
        call    exit

The above was noticed in answering this question:

How do i get rid of call __x86.get_pc_thunk.ax

Is this behavior (and its motivation) documented anywhere, and is there any way to suppress it? GCC has x86 target-specific options to set the preferred/assumed incoming and outgoing stack alignment and enable/disable realignment for arbitrary functions, but they don't seem to be honored for main.

zwol · Accepted Answer

This answer is based on source diving. I do not know what the developers' intentions or motivations were. All of the code involved seems to date to 2008ish, which is after my own time working on GCC, but long enough ago that people's memories have probably gotten fuzzy. (GCC 4.9 was released in 2014; did you go back any farther than that? If I'm right about when this code was introduced, the clumsy stack alignment for main should start happening in version 4.4.)

GCC's x86 back end appears to have been coded to make extra-conservative assumptions about the stack alignment on entry to main, regardless of command-line options. The function ix86_minimum_incoming_stack_boundary is called to compute the expected stack alignment on entry for each function, and the last thing it does ...

12523   /* Stack at entrance of main is aligned by runtime.  We use the
12524      smallest incoming stack boundary. */
12525   if (incoming_stack_boundary > MAIN_STACK_BOUNDARY
12526       && DECL_NAME (current_function_decl)
12527       && MAIN_NAME_P (DECL_NAME (current_function_decl))
12528       && DECL_FILE_SCOPE_P (current_function_decl))
12529     incoming_stack_boundary = MAIN_STACK_BOUNDARY;
12530 
12531   return incoming_stack_boundary;

... is override the expected stack alignment to a conservative constant, MAIN_STACK_BOUNDARY, if the function being compiled is main. MAIN_STACK_BOUNDARY is 128 (bits) when compiling 64-bit code and 32 when compiling 32-bit code. As far as I can tell, there is no command-line knob that will make it expect the stack to be more aligned than that on entry to main. I can persuade it to skip stack alignment for main by telling it that no additional alignment is needed, compiling your test program with -m32 -mpreferred-stack-boundary=2 gives me

main:
        pushl   $0
        call    exit

with GCC 7.3.

The write-only manipulations of %ecx appear to be a missed-optimization bug. They are coming from this part of ix86_expand_prologue:

13695       /* Grab the argument pointer.  */
13696       t = plus_constant (Pmode, stack_pointer_rtx, m->fs.sp_offset);
13697       insn = emit_insn (gen_rtx_SET (crtl->drap_reg, t));
13698       RTX_FRAME_RELATED_P (insn) = 1;
13699       m->fs.cfa_reg = crtl->drap_reg;
13700       m->fs.cfa_offset = 0;
13701
13702       /* Align the stack.  */
13703       insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx,
13704                                         stack_pointer_rtx,
13705                                         GEN_INT (-align_bytes)));
13706       RTX_FRAME_RELATED_P (insn) = 1;
13707

The intention is to save a pointer to the incoming argument area before realigning the stack, so that it is straightforward to access arguments. Either because this happens fairly late in the pipeline (after register allocation), or because the instructions are marked FRAME_RELATED, nothing manages to delete those instructions again when they turn out to be unnecessary.

I imagine the GCC devs would at least listen to a bug report about this, but they might reasonably consider it low priority, because these are instructions that are executed only once in the lifetime of the whole program, they're only actually dead when main doesn't use its arguments, and they only happen in the traditional 32-bit ABI, which I have the impression is considered a second-class target nowadays.

Motivation for useless prologue in gcc-compiled main(), disabling it?

Tags:

c

gcc

R.. GitHub STOP HELPING ICE

1 Answers

zwol

Recent Activity

Donate For Us

Motivation for useless prologue in gcc-compiled main(), disabling it?

Tags:

c

gcc

R.. GitHub STOP HELPING ICE

1 Answers

zwol

Related questions

Recent Activity

Donate For Us