Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the Haskell runtime distinguish between pointers and unboxed word-sized values?

On a 64-bit platform, OCaml's int type is 63-bits due to pointer tagging. This allows ints to be unboxed and still be distinguishable from pointers at runtime, allowing for a precise GC. IIRC, the GC in the GHC RTS is precise too, but GHC's Ints are 64-bit and they can be unboxed. If that's the case, then how the runtime system distinguish between Ints and pointers? It seems the same problem would arise with distinguishing between other unboxed word-sized values and pointers too.

like image 491
typesanitizer Avatar asked Mar 05 '23 19:03

typesanitizer


2 Answers

The short version is: values are allocated with all of the pointers and all of the non-pointers grouped together, and include a bit of metadata so that the GC knows what to follow.

Note that the Haskell report actually allows Int to be 31 or 63 bits, making the OCaml strategy valid -- but it's not what GHC does.

The slightly longer version is that said "metadata" is really a couple of functions that are used by the garbage collector. To give a rough sketch you could think of values in Haskell as being represented at runtime as an OO-style object with methods:

class Fn:
    # By far the most used; this evaluates the value:
    enter(...) -> ...
    # Used by the garbage collector:
    scavenge(...) -> ...
    evacuate(...) -> ...

The upshot of this is that values know enough about themselves to do the bookkeeping, and for some common layouts GHC defines specialized versions of these functions; the garbage collector can be mostly oblivious to exactly how scavenge and evacuate work. The segregating of pointers & non-pointers is so that a generic implementation can be made and shared for the common case.

Note that the 'enter' function exists even for Haskell values that are not "functions," since laziness means that even if the type is e.g. Int, evaluation may still involve computation.

If you want the very long version, I suggest reading:

https://www.microsoft.com/en-us/research/publication/implementing-lazy-functional-languages-on-stock-hardware-the-spineless-tagless-g-machine/

which goes into quite a lot of detail on how Haskell gets mapped to hardware. It's a fascinating read, and there's a lot of neat stuff in there that differs substantially from how most (strict) functional languages are implemented. The paper is old, but still relevant.

like image 181
zenhack Avatar answered Apr 30 '23 11:04

zenhack


Essentially, this is described in the GHC RTS documentation, when it details the format of the heap objects, comprising a header and a payload.

The header describes which words of the payload are pointers, so that garbage collection can work.

This means that there is, roughly, an overhead of a word for any heap object.

like image 24
chi Avatar answered Apr 30 '23 11:04

chi