I've been trying to find the OCaml calling convention so that I can manually interpret the stack traces that gdb can't parse. Unfortunately, it seems like nothing has ever been written down in English except for general observations. E.g., people will comment on blogs that OCaml passes many arguments in registers. (If there is English documentation somewhere, a link would be much appreciated.)
So I've been trying to puzzle it out from the ocamlopt source. Could anyone confirm the accuracy of these guesses?
And, if I'm right about the first ten arguments being passed in registers, is it just not generally possible to recover the arguments to a function call? In C, the arguments would still be pushed onto the stack somewhere, if only I walk back up to the correct frame. In OCaml, it would seem that callees are free to destroy their callers' arguments.
Register allocation (from /asmcomp/amd64/proc.ml
)
For calling into OCaml functions,
For calling into C functions, the standard amd64 C convention is used:
Return address (from /asmcomp/amd64/emit.mlp
)
The return address is the first pointer pushed into the call frame, in accordance with amd64 C convention. (I'm guessing the ret
instruction assumes this layout.)
Exceptions (from /asmcomp/linearize.ml
)
The code try (...body...) with (...handler...); (...rest...)
gets linearized like this:
Lsetuptrap .body
(...handler...)
Lbranch .join
Llabel .body
Lpushtrap
(...body...)
Lpoptrap
Llabel .join
(...rest...)
and then emitted as assembly like this (destinations on the right):
call .body
(...handler...)
jmp .join
.body:
pushq %r14
movq %rsp, %r14
(...body...)
popq %r14
addq %rsp, 8
.join:
(...rest...)
Somewhere in the body, there's a linearized opcode Lraise
which gets emitted as this exact assembly:
movq %r14, %rsp
popq %r14
ret
Which is really neat! Instead of this setjmp/longjmp business, we create a dummy frame whose return address is the exception handler and whose only local is the previous such dummy frame. The /asmcomp/amd64/proc.ml
has a comment calling $r14 the "trap pointer" so I'll call this dummy frame the trap frame. When we want to raise an exception, we set the stack pointer to the most recent trap frame, set the trap pointer to the trap frame before that, and then "return" into the exception handler. And I bet if the exception handler can't handle this exception, it just reraises it.
The exception is in %eax.
A calling convention governs how functions on a particular architecture and operating system interact. This includes rules about includes how function arguments are placed, where return values go, what registers functions may use, how they may allocate local variables, and so forth.
Callee vs caller saved is a convention for who is responsible for saving and restoring the value in a register across a call. ALL registers are "global" in that any code anywhere can see (or modify) a register and those modifications will be seen by any later code anywhere.
In Linux, GCC sets the de facto standard for calling conventions. Since GCC version 4.5, the stack must be aligned to a 16-byte boundary when calling a function (previous versions only required a 4-byte alignment). A version of cdecl is described in System V ABI for i386 systems.
A calling convention is a scheme for how functions receive parameters from their caller and how they return a result. The calling conventions can differ in where parameters and return values are placed (in registers; on the call stack; a mix of both), the order they are placed.
This is more an answer than a question! The bit I know on this topic, I have learned by looking at the source, just like you, so don't expect further precisions to be much more authoritative than your post.
Yes, I think OCaml uses specialized calling conventions with caller-save registers only. A benefit of this choice is that it simplifies tail-calls: when you jump through a tail-call¹, you don't have to spill or reload any register.
¹: for non-self tail calls, this only works when there are not too much arguments, and therefore we don't need to spill. If stack allocation is needed, the call is turned into a non-tail call.
Note that calling conventions still depends strongly on the target architecture. On x86 for example, a small numbers of globals are used when the registers are exhausted and before spilling on the stack, to preserve tail-calls.
I also agree on "leftmost-first-in": arguments are traversed in order by calling_conventions
in proc.ml
, stored in offset order by slot_offset
in emit.mlp
; they where computed right-to-left, but returned in order, in selectgen.ml
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With