Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can GHC really never inline map, scanl, foldr, etc.?

I've noticed the GHC manual says "for a self-recursive function, the loop breaker can only be the function itself, so an INLINE pragma is always ignored."

Doesn't this say every application of common recursive functional constructs like map, zip, scan*, fold*, sum, etc. cannot be inlined?

You could always rewrite all these function when you employ them, adding appropriate strictness tags, or maybe employ fancy techniques like the "stream fusion" recommended here.

Yet, doesn't all this dramatically constrain our ability to write code that's simultaneously fast and elegant?

like image 246
Jeff Burdges Avatar asked Mar 11 '12 20:03

Jeff Burdges


2 Answers

Indeed, GHC cannot at present inline recursive functions. However:

  • GHC will still specialise recursive functions. For instance, given

    fac :: (Eq a, Num a) => a -> a
    fac 0 = 1
    fac n = n * fac (n-1)
    
    f :: Int -> Int
    f x = 1 + fac x
    

    GHC will spot that fac is used at type Int -> Int and generate a specialised version of fac for that type, which uses fast integer arithmetic.

    This specialisation happens automatically within a module (e.g. if fac and f are defined in the same module). For cross-module specialisation (e.g. if f and fac are defined in different modules), mark the to-be-specialised function with an INLINABLE pragma:

    {-# INLINABLE fac #-}
    fac :: (Eq a, Num a) => a -> a
    ...
    
  • There are manual transformations which make functions nonrecursive. The lowest-power technique is the static argument transformation, which applies to recursive functions with arguments which don't change on recursive calls (eg many higher-order functions such as map, filter, fold*). This transformation turns

    map f []     = []
    map f (x:xs) = f x : map f xs
    

    into

    map f xs0 = go xs0
      where
        go []     = []
        go (x:xs) = f x : go xs
    

    so that a call such as

     g :: [Int] -> [Int]
     g xs = map (2*) xs
    

    will have map inlined and become

     g [] = []
     g (x:xs) = 2*x : g xs
    

    This transformation has been applied to Prelude functions such as foldr and foldl.

  • Fusion techniques are also make many functions nonrecursive, and are more powerful than the static argument transformation. The main approach for lists, which is built into the Prelude, is shortcut fusion. The basic approach is to write as many functions as possible as non-recursive functions which use foldr and/or build; then all the recursion is captured in foldr, and there are special RULES for dealing with foldr.

    Taking advantage of this fusion is in principle easy: avoid manual recursion, preferring library functions such as foldr, map, filter, and any functions in this list. In particular, writing code in this style produces code which is "simultaneously fast and elegant".

  • Modern libraries such as text and vector use stream fusion behind the scenes. Don Stewart wrote a pair of blog posts (1, 2) demonstrating this in action in the now obsolete library uvector, but the same principles apply to text and vector.

    As with shortcut fusion, taking advantage of stream fusion in text and vector is in principle easy: avoid manual recursion, preferring library functions which have been marked as "subject to fusion".

  • There is ongoing work on improving GHC to support inlining of recursive functions. This falls under the general heading of supercompilation, and recent work on this seems to have been led by Max Bolingbroke and Neil Mitchell.

like image 62
rp123 Avatar answered Oct 18 '22 22:10

rp123


In short, not as often as you would think. The reason is that the "fancy techniques" such as stream fusion are employed when the libraries are implemented, and library users don't need to worry about them.

Consider Data.List.map. The base package defines map as

map :: (a -> b) -> [a] -> [b]
map _ []     = []
map f (x:xs) = f x : map f xs

This map is self-recursive, so GHC won't inline it.

However, base also defines the following rewrite rules:

{-# RULES
"map"       [~1] forall f xs.   map f xs                = build (\c n -> foldr (mapFB c f) n xs)
"mapList"   [1]  forall f.      foldr (mapFB (:) f) []  = map f
"mapFB"     forall c f g.       mapFB (mapFB c f) g     = mapFB c (f.g) 
  #-}

This replaces uses of map via foldr/build fusion, then, if the function cannot be fused, replaces it with the original map. Because the fusion happens automatically, it doesn't depend on the user being aware of it.

As proof that this all works, you can examine what GHC produces for specific inputs. For this function:

proc1 = sum . take 10 . map (+1) . map (*2)

eval1 = proc1 [1..5]
eval2 = proc1 [1..]

when compiled with -O2, GHC fuses all of proc1 into a single recursive form (as seen in the core output with -ddump-simpl).

Of course there are limits to what these techniques can accomplish. For example, the naive average function, mean xs = sum xs / length xs is easily manually transformed into a single fold, and frameworks exist that can do so automatically, however at present there's no known way to automatically translate between standard functions and the fusion framework. So in this case, the user does need to be aware of the limitations of the compiler-produced code.

So in many cases compilers are sufficiently advanced to create code that's fast and elegant. Knowing when they will do so, and when the compiler is likely to fall down, is IMHO a large part of learning how to write efficient Haskell code.

like image 8
John L Avatar answered Oct 18 '22 21:10

John L