Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What do ghc's FUN_1_0, FUN_0_1, etc closure types mean?

Tags:

haskell

ghc

Closure types of FUN_1_0, THUNK_1_0 etc often appear in heap profiles, but aren't documented anywhere. The undocumented types in ghc-heap correspond to these constants, and there are some further clues in Layout.hs but it's still not clear what these mean.

This answer suggests FUN_1_0 and FUN_2_0 correspond to functions of one or two arguments, but that seems wrong (since there's no FUN_3_0 for instance)

What do the various FUN_ and THUNK_ types mean?

like image 552
jberryman Avatar asked Aug 25 '20 15:08

jberryman


1 Answers

The variants FUN_m_n and THUNK_m_n are just special cases of FUN and THUNK meant to speed garbage collection. (There's also a similar set of CONSTR_m_n special cases for CONSTR.)

A FUN heap object is a function closure. A THUNK heap object is a suspended computation of an expression. Both types of object potentially contain a payload of several word-sized fields giving the closed values for any free variables in the function body or thunk expression. A CONSTR represents a saturated constructor, and its payload includes values for its fields. The Making a Fast Curry paper has some useful background on this.

In the payloads for any of these closure types, some of those fields may be pointers (to boxed values) and some may be non-pointers (unboxed values appearing directly in the closure). When the garbage collector is walking the heap, it needs to follow pointers and ignore non-pointers. For all FUNs, THUNKs, and CONSTRs (whether special case _m_n types or general), their info tables have a bitmap that specifies how many fields are in the payload and which fields contain pointers. However, there's a cost for the garbage collector to redirect to the info table and process the bitmap.

To speed garbage collection, a few special cases are defined with a different closure type, as FUN_m_n, etc., where m is the number of pointer fields, and n is the number of non-pointer fields. For the variants _0_1, _1_0, _2_0, and _0_2, the garbage collector can immediately determine both the number of fields total and the fact that all are pointers (and so should be followed) or non-pointers (and so should be skipped), without redirecting to the info table bitmap. For the _1_1 variants, there's an ambiguity, since it's not clear which order the pointer and non-pointer appear in the payload, so the order pointer-then-non-pointer has been chosen. If a closure has a non-pointer-then-pointer, it falls back to the general FUN or THUNK bitmap scheme. You can see the algorithm at work in rts/sm/Scav.c function scavenge_block, which is one of the few pieces of code that actually treats the variants differently.

like image 64
K. A. Buhr Avatar answered Nov 18 '22 00:11

K. A. Buhr