I am learning to do assembly language again, and the only problem I have had so far has been doing calls to C. The book I have is geared to 32 bit, and I am working in 64 bit. Apparently there is a big difference in the calling conventions, and the http://www.x86-64.org/documentation site is down. So after some digging / testing, compiling dummy programs in C and spending 3 days on this I thought I would post my findings if it helps anyone else.
Does RAX need to be given the float count? Is stack padding "shadow space" 16 or 32 bits? Is this macro for aligning the stack passable for small programs? I know you can NOP-pad the code with align, I was not sure about the stack frame.
; pf.asm compiled with 'nasm -o pf.o -f elf64 -g -F stabs'
; linked with 'gcc -o pf pf.o'
; 64-bit Bodhi (ubuntu) linux
%include "amd64_abi.mac"
[SECTION .data]
First_string: db "First string.",10,"%s", "%d is an integer. So is %d",10
db "Floats XMM0:%5.7f XMM1:%.6le XMM2:%lg",10,0
Second_String: db "This is the second string... %s's are not interpreted here.",10
db " Neither are %d's nor %f's. 'Cause it is a passed value.", 10, 0
; Just a regular string for insert.
[SECTION .bss]
[SECTION .text]
EXTERN printf
GLOBAL main
main:
_preserve_64AMD_ABI_regs ; Saves RBP, RBX, R12-R15
mov rdi, First_string ; Start of string to be formatted. Null terminated
mov rsi, Second_String ; String addy of first %s in main string. Not interpretted
mov rcx, 0456 ; Second Integer (Register is specific for ordered arguments.)
mov rdx, 0123 ; First integer (Order of assignment does not matter.)
; Order of Integer/Pointer Registers:
; $1:RDI $2:RSI $3:RDX $4:RCX $5:R8 $6:R9
mov rax,0AABBCCh ; Test value to be stored in xmm0
cvtsi2sd xmm0, rax ; Convert quad to scalar double
mov rax,003333h ; Test value to be stored in xmm1
cvtsi2sd xmm1, rax ; Convert quad to scalar double
cvtsi2sd xmm2, rax ; Convert quad to scalar double
divsd xmm2, xmm0 ; Divide scalar double
sub rsp, 16 ; Allocates 16 byte shadow memory
_prealign_stack_to16 ; Move to the lower end 16byte boundry (Seg-Fault otherwise)
; mov rax, 3 ; Count of xmm registers used for floats. ?!needed?!
Before_Call:
call printf ; Send the formatted string to C-printf
_return_aligned_stack ; Returns RSP to the previous alignment
add rsp, 16 ; reallocate shadow memory
_restore_64AMD_ABI_regs_RET
; Ends pf.asm
; amd64_abi.mac
; Aligns stack (RSP) to 16 byte boundry, padding needed amount in rbx
%macro _preserve_64AMD_ABI_regs 0
push rbp
mov rbp, rsp
push rbx
push r12
push r13
push r14
push r15
%endmacro
%macro _restore_64AMD_ABI_regs_RET 0
pop r15
pop r14
pop r13
pop r12
pop rbx
mov rsp, rbp
pop rbp
ret
%endmacro
%macro _prealign_stack_to16 0
mov rbx, 0Fh ; Bit mask for low 4-bits 10000b = 16 :: 01111b = 15b
and rbx, rsp ; get bits 0-3 into rbx
sub rsp, rbx ; remove them from rsp, rounding down to multiple of 16 (10h)
%endmacro
; De-aligns stack (RSP)from 16 byte boundry using saved rbx offset
%macro _return_aligned_stack 0
add rsp, rbx
%endmacro
OUTPUT: First string. This is the second string... %s's are not interpreted here. Neither are %d's nor %f's. 'Cause it is a passed value. 123 is an integer. So is 456 Floats XMM0:11189196.0000000 XMM1:1.310700e+04 XMM2:0.0011714
Resources: System V ABI v0.96: http://www.uclibc.org/docs/psABI-x86_64.pdf (It is not available at x86-64.org Site is down) Assembly Language Step By Step. Jeff Duntemann Chapter 12 Intel 64-bit instruction set. http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
Yes, RAX
(actually AL
) should hold the number of XMM
registers used.
Your stack alignment code is overcomplicated, normally you just do AND rsp, -16
. Also, stack alignment is typically only done once (usually at the start of main
) and then it is maintained by always adjusting rsp
appropriately.
The SYSV ABI doesn't use shadow space (that's microsoft convention) instead it uses a "red zone", but that's not affecting the calling sequence.
Update about stack alignment:
In functions that already get aligned RSP
(generally everything except main
), you just make sure any called functions in turn get RSP
that's changed by a multiple of 16.
If you are using a standard frame pointer, then your functions start with a PUSH RBP
so then you only have to make sure you allocate space in multiples of 16 (if needed), like so:
push rbp
mov rbp, rsp
sub rsp, n*16
...
mov rsp, rbp
pop rbp
ret
Otherwise, you'll have to compensate for the 8 bytes of RIP
put on the stack (as you correctly pointed that out in your comment):
sub rsp, n*16+8
...
add rsp, n*16+8
ret
Both of the above apply only if you call other functions, that is in leaf functions you can do whatever you want. In addition, the red zone I mentioned earlier is useful in leaf functions, because you can use 128 bytes under the stack pointer without explicit allocation, meaning you don't have to adjust RSP
at all:
; in leaf functions you can use memory under the stack pointer
; (128 byte red zone)
mov [rsp-8], rax
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With