I compiled the following C code:
typedef struct {
long x, y, z;
} Foo;
long Bar(Foo *f, long i)
{
return f[i].x + f[i].y + f[i].z;
}
with the command gcc -S -O3 test.c
. Here is the Bar function in the output:
.section __TEXT,__text,regular,pure_instructions
.globl _Bar
.align 4, 0x90
_Bar:
Leh_func_begin1:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
leaq (%rsi,%rsi,2), %rcx
movq 8(%rdi,%rcx,8), %rax
addq (%rdi,%rcx,8), %rax
addq 16(%rdi,%rcx,8), %rax
popq %rbp
ret
Leh_func_end1:
I have a few questions about this assembly code:
pushq %rbp
", "movq %rsp, %rbp
", and "popq %rbp
", if neither rbp
nor rsp
is used in the body of the function?rsi
and rdi
automatically contain the arguments to the C function (i
and f
, respectively) without reading them from the stack?I tried increasing the size of Foo to 88 bytes (11 long
s) and the leaq
instruction became an imulq
. Would it make sense to design my structs to have "rounder" sizes to avoid the multiply instructions (in order to optimize array access)? The leaq
instruction was replaced with:
imulq $88, %rsi, %rcx
The function is simply building its own stack frame with these instructions. There's nothing really unusual about them. You should note, though, that due to this function's small size, it will probably be inlined when used in the code. The compiler is always required to produce a "normal" version of the function, though. Also, what @ouah said in his answer.
This is because that's how the AMD64 ABI specifies the arguments should be passed to functions.
If the class is INTEGER, the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used.
Page 20, AMD64 ABI Draft 0.99.5 – September 3, 2010
This is not directly related to the structure size, rather - the absolute address that the function has to access. If the size of the structure is 24 bytes, f
is the address of the array containing the structures, and i
is the index at which the array has to be accessed, then the byte offset to each structure is i*24
. Multiplying by 24 in this case is achieved by a combination of lea
and SIB addressing. The first lea
instruction simply calculates i*3
, then every subsequent instruction uses that i*3
and multiplies it further by 8, therefore accessing the array at the needed absolute byte offset, and then using immediate displacements to access the individual structure members ((%rdi,%rcx,8)
. 8(%rdi,%rcx,8)
, and 16(%rdi,%rcx,8)
). If you make the size of the structure 88 bytes, there is simply no way of doing such a thing swiftly with a combination of lea
and any kind of addressing. The compiler simply assumes that a simple imull
will be more efficient in calculating i*88
than a series of shifts, adds, lea
s or anything else.
- What is the purpose of pushq %rbp, movq %rsp, %rbp, and popq %rbp, if neither rbp nor rsp is used in the body of the function?
To keep track of the frames when you use a debugger. Add -fomit-frame-pointer
to optimize (note that it should be enabled at -O3
but in a lot of gcc
versions I used it is not).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With