Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rewriting as a practical optimization technique in GHC: Is it really needed?

I was reading the paper authored by Simon Peyton Jones, et al. named “Playing by the Rules: Rewriting as a practical optimization technique in GHC”. In the second section, namely “The basic idea” they write:

Consider the familiar map function, that applies a function to each element of a list. Written in Haskell, map looks like this:

map f []     = []
map f (x:xs) = f x : map f xs

Now suppose that the compiler encounters the following call of map:

map f (map g xs)

We know that this expression is equivalent to

map (f . g) xs

(where “.” is function composition), and we know that the latter expression is more efficient than the former because there is no intermediate list. But the compiler has no such knowledge.

One possible rejoinder is that the compiler should be smarter --- but the programmer will always know things that the compiler cannot figure out. Another suggestion is this: allow the programmer to communicate such knowledge directly to the compiler. That is the direction we explore here.

My question is, why can't we make the compiler smarter? The authors say that “but the programmer will always know things that the compiler cannot figure out”. However, that's not a valid answer because the compiler can indeed figure out that map f (map g xs) is equivalent to map (f . g) xs, and here is how:

map f (map g xs)
  1. map g xs unifies with map f [] = [].

    Hence map g [] = [].

  2. map f (map g []) = map f [].

    map f [] unifies with map f [] = [].

    Hence map f (map g []) = [].

  3. map g xs unifies with map f (x:xs) = f x : map f xs.

    Hence map g (x:xs) = g x : map g xs.

  4. map f (map g (x:xs)) = map f (g x : map g xs).

    map f (g x : map g xs) unifies with map f (x:xs) = f x : map f xs.

    Hence map f (map g (x:xs)) = f (g x) : map f (map g xs).

Hence we now have the rules:

map f (map g [])     = []
map f (map g (x:xs)) = f (g x) : map f (map g xs)

As you can see f (g x) is just (f . g) and map f (map g xs) is being called recursively. This is exactly the definition of map (f . g) xs. The algorithm for this automatic conversion seems to be pretty simple. So why not implement this instead of rewriting rules?

like image 496
Aadit M Shah Avatar asked Nov 09 '14 11:11

Aadit M Shah


3 Answers

Aggressive inlining can derive many of the equalities that rewrite rules are short-hand for. The differences is that inlining is "blind", so you don't know in advance if the result will be better or worse, or even if it will terminate.

Rewrite rules, however, can do completely non-obvious things, based on much higher level facts about the program. Think of rewrite rules as adding new axioms to the optimizer. By adding these you have a richer rule set to apply, making complicated optimizations easier to apply.

Stream fusion, for example, changes the data type representation. This cannot be expressed through inlining, as it involves a representation type change (we reframe the optimization problem in terms of the Stream ADT). Easy to state in rewrite rules, impossible with inlining alone.

like image 177
Don Stewart Avatar answered Oct 17 '22 17:10

Don Stewart


Something in that direction was investigated in a Bachelor’s thesis of Johannes Bader, a student of mine: Finding Equations in Functional Programs (PDF file).

To some degree it is certainly possible, but

  • it is quite tricky. Finding such equations is in a sense as hard as finding proofs in a theorem proofer, and
  • it is not often very useful, because it tends to find equations that the programmer would rarely write directly.

It is however useful to clean up after other transformations such as inlining and various form of fusion.

like image 26
Joachim Breitner Avatar answered Oct 17 '22 18:10

Joachim Breitner


This could be viewed as a balance between balancing expectations in the specific case, and balancing them in the general case. This balance can generate funny situations where you can know how to make something faster, but it is better for the language in general if you don't.

In the specific case of maps in the structure you give, the computer could find optimizations. However, what about related structures? What if the function isn't map? What if there's an additional layer of indirection, such as a function that returns map. In those cases, the compiler cannot optimize easily. This is the general case problem.

How if you do optimize the special case, one of two outcomes occurs

  • Nobody relies on it, because they aren't sure if it is there or not. In this case, articles like the one you quote get written
  • People do start relying on it, and now every developer is forced to remember "maps done in this configuration get automatically converted to the fast version for me, but if I do it in this configuration I don't.' This starts to manipulate the way people use the language, and can actually reduce readability!

Given the need for developers to think about such optimizations in the general case, we expect to see developers doing these optimizations in the simple case, decreasing the need to for the optimization in the first place!

Now, if it turns out that the particular case you are interested accounts for something massive like 2% of the world codebase in Haskell, there would be a much stronger argument for applying your special-case optimization.

like image 1
Cort Ammon Avatar answered Oct 17 '22 18:10

Cort Ammon