Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dlang - Understaning std.cycle() in assembly

import std.range : cycle;
void foo() pure @safe {
    cycle([1, 2]);
}

Today i encountered a program written in the D language. I'm trying to understand its assembly code, starting with a simple function.

From the asm output on the D compiler explorer:

pure nothrow @nogc @safe std.range.Cycle!(int[]).Cycle std.range.cycle!(int[]).cycle(int[]):
 push   rbp
 mov    rbp,rsp
 sub    rsp,0x40
 mov    QWORD PTR [rbp-0x20],rdi
 mov    QWORD PTR [rbp-0x10],rsi
 mov    QWORD PTR [rbp-0x8],rdx
 ... rest of the function

I've tried to read it serveral time, but can't understanding why std.range.cycle() gets 3 arguments (RDI, RSI and RDX), or where my range is ([1, 2]). It's not a C-like structure?

Or am I missing something?

like image 725
Khánh Tạ Quang Avatar asked Mar 11 '23 02:03

Khánh Tạ Quang


1 Answers

It looks like you're using the x86-64 SystemV ABI, based on rdi and rsi for arg passing, since the Windows 64-bit ABI uses different regs. See the x86 tag wiki for links to ABI docs, or see the current revision here.

Small objects (like structs) passed by value go in multiple integer registers. Returning large objects (more than 128 bits) by value also uses a hidden pointer to space allocated by the caller, instead of packing into RDX:RAX. This is what happens in your function.

Based on the asm and docs, I think a Cycle object has three values: start, end, and index. I don't know D at all, but it would make sense. Since they're all 64-bit, that makes it too large to fit in RDX:RAX, so it's returned by hidden pointer.

The arg-passing registers on entry to Cycle() are:

  • RDI: "hidden" pointer to the return value (which is a struct of three 64-bit integers)
  • RSI: first member of the Range arg (I'll call it range_start)
  • RDX: second member of the Range arg (I'll call it range_end)

I enabled optimization to get more readable asm without so much noise, but it looks like this D compiler is a lot less sophisticated than clang or gcc, unfortunately. With -O -release -inline (as recommended by this page), it still does a lot of store/reload to the stack.

pure nothrow @nogc @safe std.range.Cycle!(int[]).Cycle std.range.cycle!(int[]).cycle(int[]):
 sub    rsp,0x28
 mov    QWORD PTR [rsp+0x20],rdi        # hidden first arg (return-value pointer).
 mov    QWORD PTR [rsp+0x8],0x0         # totally useless: overwritten without read

 mov    QWORD PTR [rsp+0x10],0x0        # totally useless: same.

 mov    QWORD PTR [rsp+0x8],rsi         # first "real" arg
 mov    QWORD PTR [rsp+0x10],rdx        # second "real" arg
 xor    eax,eax
 xor    edx,edx                         # zero rax:rdx.  Perhaps from the index=0 default when you only use one arg?
 div    QWORD PTR [rsp+0x8]             # divide 0 by first arg of the range.
 mov    QWORD PTR [rsp+0x18],rdx        # remainder of (index / range_start), I guess.
 lea    rsi,[rsp+0x8]                   # RSI=pointer to where range_start, range_end, and index/range_start were stored on the stack.
 movs   QWORD PTR es:[rdi],QWORD PTR ds:[rsi]  # copy to the dst buffer.  A smart compiler would have stored there in the first place, instead of to local scratch and then copying.
 movs   QWORD PTR es:[rdi],QWORD PTR ds:[rsi]  # movs is not very efficient, this is horrible code.
 movs   QWORD PTR es:[rdi],QWORD PTR ds:[rsi]
 mov    rax,QWORD PTR [rsp+0x20]        # mov rax, rdi  before those MOVS instructions would have been much more efficient.
 add    rsp,0x28
 ret    

The ABI requires functions that return large objects to return the hidden pointer in RAX, so the caller doesn't have to separately keep a copy of the pointer to the return buffer. That's why the function sets RAX at all.


A good compiler would have done this:

std.range.Cycle...:
   mov    [rdi], rsi           # cycle_start
   mov    [rdi+0x8], rdx       # cycle_end
   mov    [rdi+0x10], 0        # index
   mov    rax, rdi
   ret

Or just inlined the call to Cycle entirely, since it's trivial. Actually, I think it did inline into foo(), but a stand-alone definition for cycle() is still emitted.

We can't tell which two functions foo() calls, because the compiler explorer seems to be disassembling the .o (not the linked binary) without resolving symbols. So the call offset is 00 00 00 00, a placeholder for the linker. But it's probably calling a memory allocation function, because it makes the call with esi=2 and edi=0. (Using mov edi, 0 in optimizing release mode! Yuck!). The call target shows as the next instruction, because that's where call's rel32 displacement counts from.

Hopefully LDC or GDC do a better job, since they're based on modern optimizing backends (LLVM and gcc), but the compiler-explorer site you linked doesn't have those compilers installed. If there's another site based on Matt Godbolt's compiler explorer code, but with other D compilers, that would be cool.

like image 51
Peter Cordes Avatar answered Mar 20 '23 14:03

Peter Cordes