To generate assembly code we essentially request GCC to stop before the assembly stage of compilation and dump what it has generated from the compiler backend. This writes the assembly code to a foobar. s file. For x86 and x64 assembly code, the AT&T syntax is used by default.
The GNU Assembler, commonly known as gas or as, is the assembler developed by the GNU Project. It is the default back-end of GCC. It is used to assemble the GNU operating system and the Linux kernel, and various other software.
The asm keyword allows you to embed assembler instructions within C code. GCC provides two forms of inline asm statements. A basic asm statement is one with no operands (see Basic Asm), while an extended asm statement (see Extended Asm) includes one or more operands.
asm volatile ("" ::: "memory") AFAIK is the same as the previous. The volatile keyword tells the compiler that it's not allowed to move this assembly block. For example, it may be hoisted out of a loop if the compiler decides that the input values are the same in every invocation.
For those who wonder what the generated code comes from, first note that when GCC compile myAbs
with stack protection it transform it into this form
long myAbs(long j) {
uintptr_t canary = __stack_chk_guard;
register long result = j < 0 ? -j : j;
if ( (canary = canary ^ __stack_chk_guard) != 0 )
__stack_chk_fail();
}
The code to simply perform j < 0 ? -j : j;
is
movq %rdi, %rdx ;RDX = j
movq %rdi, %rax ;RAX = j
sarq $63, %rdx ;RDX = 0 if j >=0, 0fff...ffh if j < 0
xorq %rdx, %rax ;Note: x xor 0ff...ffh = Not X, x xor 0 = x
;RAX = j if j >=0, ~j if j < 0
subq %rdx, %rax ;Note: 0fff...ffh = -1
;RAX = j+0 = j if j >= 0, ~j+1 = -j if j < 0
;~j+1 = -j in two complement
Analyzing the generated code we get
pushq %rbp
movq %rsp, %rbp ;Standard prologue
subq $4144, %rsp ;Allocate slight more than 4 KiB
orq $0, (%rsp) ;Perform a useless RW operation to test if there is enough stack space for __stack_chk_fail
addq $4128, %rsp ;This leave 16 byte allocated for local vars
movq %rdi, %rdx ;See above
sarq $63, %rdx ;See above
movq %fs:40, %rax ;Get the canary
movq %rax, -8(%rbp) ;Save it as a local var
xorl %eax, %eax ;Clear it
movq %rdi, %rax ;See above
xorq %rdx, %rax ;See above
subq %rdx, %rax ;See above
movq -8(%rbp), %rcx ;RCX = Canary
xorq %fs:40, %rcx ;Check if equal to the original value
jne .L5 ;If not fail
leave
ret
.L5:
call __stack_chk_fail@PLT ;__stack_chk_fail is noreturn
So all the extra instructions are for implementing the Stack Smashing Protector.
Thanks to FUZxxl for pointing out the use of the first instructions after the prologue.
Many of the beginning calls are to setup the stack and save the return address (something which you are not doing). Seems like theres are some stack protection going on. Perhaps you could tune your compiler settings to get rid of some overhead.
Perhaps adding flags to you compiler such as: -fno-stack-protector
could minimise this difference.
Yes this probably is slower than your handwritten assembly, but offers much more protection and is probably worth the slight overhead.
As for why the stack protection still exists even though it is a leaf function see here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With