Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Manual Assembly vs GCC

Tags:

c

gcc

assembly

People also ask

Does GCC produce assembly?

To generate assembly code we essentially request GCC to stop before the assembly stage of compilation and dump what it has generated from the compiler backend. This writes the assembly code to a foobar. s file. For x86 and x64 assembly code, the AT&T syntax is used by default.

What assembly language does GCC use?

The GNU Assembler, commonly known as gas or as, is the assembler developed by the GNU Project. It is the default back-end of GCC. It is used to assemble the GNU operating system and the Linux kernel, and various other software.

What is asm() in C?

The asm keyword allows you to embed assembler instructions within C code. GCC provides two forms of inline asm statements. A basic asm statement is one with no operands (see Basic Asm), while an extended asm statement (see Extended Asm) includes one or more operands.

What is asm volatile?

asm volatile ("" ::: "memory") AFAIK is the same as the previous. The volatile keyword tells the compiler that it's not allowed to move this assembly block. For example, it may be hoisted out of a loop if the compiler decides that the input values are the same in every invocation.


For those who wonder what the generated code comes from, first note that when GCC compile myAbs with stack protection it transform it into this form

long myAbs(long j) {
    uintptr_t canary = __stack_chk_guard;

    register long result = j < 0 ? -j : j;

    if ( (canary = canary ^ __stack_chk_guard) != 0 )
        __stack_chk_fail();
}

The code to simply perform j < 0 ? -j : j; is

movq    %rdi, %rdx     ;RDX = j
movq    %rdi, %rax     ;RAX = j
sarq    $63, %rdx      ;RDX = 0 if j >=0, 0fff...ffh if j < 0
xorq    %rdx, %rax     ;Note: x xor 0ff...ffh = Not X, x xor 0 = x
                       ;RAX = j if j >=0, ~j if j < 0
subq    %rdx, %rax     ;Note: 0fff...ffh = -1
                       ;RAX = j+0 = j if j >= 0, ~j+1 = -j if j < 0
                       ;~j+1 = -j in two complement

Analyzing the generated code we get

    pushq   %rbp
    movq    %rsp, %rbp       ;Standard prologue

    subq    $4144, %rsp      ;Allocate slight more than 4 KiB     
    orq     $0, (%rsp)       ;Perform a useless RW operation to test if there is enough stack space for __stack_chk_fail

    addq    $4128, %rsp      ;This leave 16 byte allocated for local vars

    movq    %rdi, %rdx       ;See above
    sarq    $63, %rdx        ;See above

    movq    %fs:40, %rax     ;Get the canary
    movq    %rax, -8(%rbp)   ;Save it as a local var
    xorl    %eax, %eax       ;Clear it

    movq    %rdi, %rax       ;See above
    xorq    %rdx, %rax       ;See above
    subq    %rdx, %rax       ;See above

    movq    -8(%rbp), %rcx   ;RCX = Canary
    xorq    %fs:40, %rcx     ;Check if equal to the original value
    jne     .L5              ;If not fail

    leave
    ret
.L5:
    call    __stack_chk_fail@PLT  ;__stack_chk_fail is noreturn

So all the extra instructions are for implementing the Stack Smashing Protector.

Thanks to FUZxxl for pointing out the use of the first instructions after the prologue.


Many of the beginning calls are to setup the stack and save the return address (something which you are not doing). Seems like theres are some stack protection going on. Perhaps you could tune your compiler settings to get rid of some overhead.

Perhaps adding flags to you compiler such as: -fno-stack-protector could minimise this difference.

Yes this probably is slower than your handwritten assembly, but offers much more protection and is probably worth the slight overhead.

As for why the stack protection still exists even though it is a leaf function see here.