I was trying to find the source for a certain kind of inlining that happens in GHC, where a function that is passed as an argument to another function is inlined. For example, I may write a definition like the following (using my own List type to avoid rewrite rules):
data MyList a = Nil | Cons a (MyList a)
deriving (Show)
mapMyList :: (a -> b) -> MyList a -> MyList b
mapMyList f Nil = Nil
mapMyList f (Cons a rest) = Cons (f a) $ mapMyList f rest
followed by a call
fromList :: [a] -> MyList a
fromList = ...
main = do
print $ mapMyList (*2) $ fromList [1..5]
mapMyList
is recursive, so it is not directly inlinable. However, in the generated Core, I see the following definition:
Rec {
-- RHS size: {terms: 16, types: 11, coercions: 0, joins: 0/0}
Main.main_$smapMyList [Occ=LoopBreaker] :: MyList Int -> MyList Int
[GblId, Arity=1, Str=<S,1*U>, Unf=OtherCon []]
Main.main_$smapMyList
= \ (sc_s2Rb :: MyList Int) ->
case sc_s2Rb of {
Nil -> Main.Nil @Int;
Cons a_aBe rest_aBf ->
Main.Cons
@Int
(case a_aBe of { GHC.Types.I# x_a24u ->
GHC.Types.I# (GHC.Prim.*# x_a24u 2#)
})
(Main.main_$smapMyList rest_aBf)
}
end Rec }
In particular, smapMyList
no longer takes a function as a parameter, and the (* 2)
is inlined here. I wanted to know which optimization pass is generating this code, and how it works?
I found this related issue, which seems to be asking for a way to guarantee this behavior using a SPECIALIZE pragma, which led me to believe that the specialization pass is doing this. However, when reading the documentation on specialization in GHC, it seems to be for specializing typeclass dictionaries, not function arguments (there are no typeclasses in my example).
I also looked at static argument transformation which seems related; for example the GHC source says this about the pass:
May be seen as removing invariants from loops: Arguments of recursive functions that do not change in recursive calls are removed from the recursion, which is done locally and only passes the arguments which effectively change.
However, I tried disabling both of these passes with -fno-static-argument-transformation -fno-specialise
and found that this transformation still happens anyway.
My motivation for asking this is that I'm implementing this transformation in another language (Koka) so I'd like to understand how other languages do it.
Complete test program
Generated core (after disabling specialisation and static argument transformation)
The optimization is called "call-pattern specialization" (a.k.a. SpecConstr) this specializes functions according to which arguments they are applied to. The optimization is described in the paper "Call-pattern specialisation for Haskell programs" by Simon Peyton Jones. The current implementation in GHC is different from what is described in that paper in two highly relevant ways:
Here is the relevant part of the core that is produced without this optimization, using -fno-spec-constr
, and with the -dsuppress-all -dsuppress-uniques -dno-typeable-binds
flags:
Rec {
mapMyList
= \ @ a @ b f ds ->
case ds of {
Nil -> Nil;
Cons a1 rest -> Cons (f a1) (mapMyList f rest)
}
end Rec }
Rec {
main_go
= \ x ->
Cons
(I# x)
(case x of wild {
__DEFAULT -> main_go (+# wild 1#);
5# -> Nil
})
end Rec }
main3 = \ ds -> case ds of { I# x -> I# (*# x 2#) }
main2
= $fShowMyList_$cshowsPrec
$fShowInt $fShowMyList1 (mapMyList main3 (main_go 1#))
main1 = main2 []
main = hPutStr' stdout main1 True
I think this optimization is a bit disappointing because it only applies to uses within the same module. Also, for a long time (since the "Stream Fusion. From Lists to Streams to Nothing at All" paper from 2007) people have hoped that this optimization could optimize stream fusion. However, as I understand it, nobody has been able to make it work properly and reliably for that purpose (see this GHC issue). So now we still use the inferior foldr/build
fusion in base.
I started a discussion about possibilities of improving the status quo in this GHC issue. I think the most promising line of future work would be to improve the static argument transform optimization. See this comment by Sebastian Graf.
I have done some more debugging, specifically I have used the -dverbose-core2core
option and inspected the results. As I expected the optimization happens because of the SpecConstr optimization. Here is the core before SpecConstr (after a Simplifier pass):
Rec {
mapMyList
= \ @ a @ b f ds ->
case ds of {
Nil -> Nil;
Cons a rest -> Cons (f a) (mapMyList f rest)
}
end Rec }
lvl = \ ds -> case ds of { I# x -> I# (*# x 2#) }
main
= case toList (mapMyList lvl (fromList (eftInt 1# 5#))) of {
...
And here is the core after SpecConstr:
Rec {
$smapMyList
= \ sc ->
case sc of {
Nil -> Nil;
Cons a rest -> Cons (lvl a) (mapMyList lvl rest)
}
mapMyList
= \ @ a @ b f ds ->
case ds of {
Nil -> Nil;
Cons a rest -> Cons (f a) (mapMyList f rest)
}
end Rec }
lvl = \ ds -> case ds of { I# x -> I# (*# x 2#) }
main
= case toList (mapMyList lvl (fromList (eftInt 1# 5#))) of {
...
So, you can see that SpecConstr creates that $smapMyList
function which is a version of mapMyList
specialized to the lvl
argument, which is the *2
function.
Note that this specialized function is not used yet, that is done using a newly created rewrite rule which fires afterwards when the Simplifier runs. If you use the flag -ddump-rule-rewrites
you can see them in action:
Rule fired
Rule: SC:mapMyList0
Module: (Main)
Before: mapMyList TyArg Int TyArg Int ValArg lvl ValArg rest
After: (\ sc -> $smapMyList sc) rest
Cont: Stop[BoringCtxt] MyList Int
Rule fired
Rule: SC:mapMyList0
Module: (Main)
Before: mapMyList
TyArg Int TyArg Int ValArg lvl ValArg fromList (eftInt 1# 5#)
After: (\ sc -> $smapMyList sc) (fromList (eftInt 1# 5#))
Cont: StrictArg toList
Select nodup wild
Stop[RhsCtxt] String
This is the core after that subsequent Simplifier pass (it has also inlined lvl
):
Rec {
$smapMyList
= \ sc ->
case sc of {
Nil -> Nil;
Cons a rest ->
Cons (case a of { I# x -> I# (*# x 2#) }) ($smapMyList rest)
}
end Rec }
main
= case toList ($smapMyList (fromList (eftInt 1# 5#))) of {
...
This also suggests a reason why full laziness is also necessary for this optimization: the lvl
function needs to be floated out because SpecConstr doesn't work on lambdas. This floating out requires full laziness.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With