Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Under what circumstances could Common Subexpression Elimination affect the laziness of a Haskell program?

Tags:

haskell

ghc

From wiki.haskell.org:

First of all, common subexpression elimination (CSE) means that if an expression appears in several places, the code is rearranged so that the value of that expression is computed only once. For example:

foo x = (bar x) * (bar x)

might be transformed into

foo x = let x' = bar x in x' * x'

thus, the bar function is only called once. (And if bar is a particularly expensive function, this might save quite a lot of work.) GHC doesn't actually perform CSE as often as you might expect. The trouble is, performing CSE can affect the strictness/laziness of the program. So GHC does do CSE, but only in specific circumstances --- see the GHC manual. (Section??)

Long story short: "If you care about CSE, do it by hand."

I'm wondering under what circumstances CSE "affects" the strictness/laziness of the program and what kind of effect that could be.

like image 903
MaiaVictor Avatar asked Nov 19 '16 15:11

MaiaVictor


1 Answers

The naive CSE rule would be

e'[e, e]  ~>  let x = e in e'[x, x].

That is, whenever a subexpression e occurs twice in the expression e', we use a let-binding to compute e once. This however leads itself to some trivial space leaks. For example

sum [1..n] + prod [1..n]

is typically O(1) space usage in a lazy functional programming language like Haskell (as sum and prod would tail-recurse and blah blah blah), but would become O(n) when the naive CSE rule is enacted. This can be terrible for programs when n is high!

The approach is then to make this rule more specific, restricting it to a small set of cases that we know won't have the problem. We can begin by more specifically enumerating the problems with the naive rule, which will form a set of priorities for us to develop a better CSE:

  • The two occurrences of e might be far apart in e', leading to a long lifetime for the let x = e binding.

  • The let-binding must always allocate a closure where previously there might not have been one.

  • This can create an unbound number of closures.

  • There are cases where the closure might never deallocate.

Something better

let x = e in e'[e]  ~>  let x = e in e'[x]

This is a more conservative rule but is much safer. Here we recognize that e appears twice but the first occurrence syntactically dominates the second expression, meaning here that the programmer has already introduced a let-binding. We can safely just reuse that let-binding and replace the second occurrence of e with x. No new closures are allocated.

Another example of syntactic domination:

case e of { x -> e'[e] }  ~> case e of { x -> e'[x] }

And yet another:

case e of {
   Constructor x0 x1 ... xn ->
     e'[e]
}

~>

case e of {
   Constructor x0 x1 ... xn ->
     e'[Constructor x0 x1 ... xn]
}

These rules all take advantage of existing structure in the program to ensure that the kinetics of space usage remain the same before and after the transformation. They are much more conservative than the original CSE but they are also much safer.

See also

For a full discussion of CSE in a lazy FPL, read Chitil's (very accessible) 1997 paper. For a full treatment of how CSE works in a production compiler, see GHC's CSE.hs module, which is documented very thoroughly thanks to GHC's tradition of writing long footnotes. The comment-to-code ratio in that module is off the charts. Also note how old that file is (1993)!

like image 193
hao Avatar answered Oct 03 '22 17:10

hao