import std.range : cycle;
void foo() pure @safe {
cycle([1, 2]);
}
Today i encountered a program written in the D language. I'm trying to understand its assembly code, starting with a simple function.
From the asm output on the D compiler explorer:
pure nothrow @nogc @safe std.range.Cycle!(int[]).Cycle std.range.cycle!(int[]).cycle(int[]):
push rbp
mov rbp,rsp
sub rsp,0x40
mov QWORD PTR [rbp-0x20],rdi
mov QWORD PTR [rbp-0x10],rsi
mov QWORD PTR [rbp-0x8],rdx
... rest of the function
I've tried to read it serveral time, but can't understanding why
std.range.cycle()
gets 3 arguments (RDI
, RSI
and RDX
), or where my range is ([1, 2]
). It's not a C-like structure?
Or am I missing something?
It looks like you're using the x86-64 SystemV ABI, based on rdi and rsi for arg passing, since the Windows 64-bit ABI uses different regs. See the x86 tag wiki for links to ABI docs, or see the current revision here.
Small objects (like structs) passed by value go in multiple integer registers. Returning large objects (more than 128 bits) by value also uses a hidden pointer to space allocated by the caller, instead of packing into RDX:RAX. This is what happens in your function.
Based on the asm and docs, I think a Cycle object has three values: start, end, and index. I don't know D at all, but it would make sense. Since they're all 64-bit, that makes it too large to fit in RDX:RAX, so it's returned by hidden pointer.
The arg-passing registers on entry to Cycle() are:
I enabled optimization to get more readable asm without so much noise, but it looks like this D compiler is a lot less sophisticated than clang or gcc, unfortunately. With -O -release -inline
(as recommended by this page), it still does a lot of store/reload to the stack.
pure nothrow @nogc @safe std.range.Cycle!(int[]).Cycle std.range.cycle!(int[]).cycle(int[]):
sub rsp,0x28
mov QWORD PTR [rsp+0x20],rdi # hidden first arg (return-value pointer).
mov QWORD PTR [rsp+0x8],0x0 # totally useless: overwritten without read
mov QWORD PTR [rsp+0x10],0x0 # totally useless: same.
mov QWORD PTR [rsp+0x8],rsi # first "real" arg
mov QWORD PTR [rsp+0x10],rdx # second "real" arg
xor eax,eax
xor edx,edx # zero rax:rdx. Perhaps from the index=0 default when you only use one arg?
div QWORD PTR [rsp+0x8] # divide 0 by first arg of the range.
mov QWORD PTR [rsp+0x18],rdx # remainder of (index / range_start), I guess.
lea rsi,[rsp+0x8] # RSI=pointer to where range_start, range_end, and index/range_start were stored on the stack.
movs QWORD PTR es:[rdi],QWORD PTR ds:[rsi] # copy to the dst buffer. A smart compiler would have stored there in the first place, instead of to local scratch and then copying.
movs QWORD PTR es:[rdi],QWORD PTR ds:[rsi] # movs is not very efficient, this is horrible code.
movs QWORD PTR es:[rdi],QWORD PTR ds:[rsi]
mov rax,QWORD PTR [rsp+0x20] # mov rax, rdi before those MOVS instructions would have been much more efficient.
add rsp,0x28
ret
The ABI requires functions that return large objects to return the hidden pointer in RAX, so the caller doesn't have to separately keep a copy of the pointer to the return buffer. That's why the function sets RAX at all.
A good compiler would have done this:
std.range.Cycle...:
mov [rdi], rsi # cycle_start
mov [rdi+0x8], rdx # cycle_end
mov [rdi+0x10], 0 # index
mov rax, rdi
ret
Or just inlined the call to Cycle entirely, since it's trivial. Actually, I think it did inline into foo(), but a stand-alone definition for cycle() is still emitted.
We can't tell which two functions foo()
calls, because the compiler explorer seems to be disassembling the .o (not the linked binary) without resolving symbols. So the call offset is 00 00 00 00
, a placeholder for the linker. But it's probably calling a memory allocation function, because it makes the call with esi=2 and edi=0. (Using mov edi, 0
in optimizing release mode! Yuck!). The call target shows as the next instruction, because that's where call's rel32 displacement counts from.
Hopefully LDC or GDC do a better job, since they're based on modern optimizing backends (LLVM and gcc), but the compiler-explorer site you linked doesn't have those compilers installed. If there's another site based on Matt Godbolt's compiler explorer code, but with other D compilers, that would be cool.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With