Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does foldr use a helper function?

Tags:

haskell

fold

In explaining foldr to Haskell newbies, the canonical definition is

foldr            :: (a -> b -> b) -> b -> [a] -> b foldr _ z []     =  z foldr f z (x:xs) =  f x (foldr f z xs) 

But in GHC.Base, foldr is defined as

foldr k z = go           where             go []     = z             go (y:ys) = y `k` go ys 

It seems this definition is an optimization for speed, but I don't see why using the helper function go would make it faster. The source comments (see here) mention inlining, but I also don't see how this definition would improve inlining.

like image 881
Matt Adams Avatar asked Oct 07 '14 18:10

Matt Adams


People also ask

Why do we use helper functions?

A helper function is a function that performs part of the computation of another function. Helper functions are used to make your programs easier to read by giving descriptive names to computations. They also let you reuse computations, just as with functions in general.

What does foldr mean in Haskell?

Foldr — foldr is a higher-order function in Haskell with the following type signature: foldr :: (a -> b -> b) -> b -> [a] -> b. The easiest way to think about foldr is that it takes a list, and replace the : operator with the given function, and the empty list with the given value.

Is foldr a recursive function?

foldr is not tail recursive. Sometime it is called the true "recursive" fold while fold left is "iterative" (because of tail recursion it is equivalent to iteration).


2 Answers

I can add some important details about GHC's optimization system.

The naive definition of foldr passes around a function. There's an inherent overhead in calling a function - especially when the function isn't known at compile time. It'd be really nice to able to inline the definition of the function if it's known at compile time.

There are tricks available to perform that inlining in GHC - and this is an example of them. First, foldr needs to be inlined (I'll get to why later). foldr's naive implementation is recursive, so cannot be inlined. So a worker/wrapper transformation is applied to the definition. The worker is recursive, but the wrapper is not. This allows foldr to be inlined, despite the recursion over the structure of the list.

When foldr is inlined, it creates a copy of all of its local bindings, too. It's more or less a direct textual inlining (modulo some renaming, and happening after the desugaring pass). This is where things get interesting. go is a local binding, and the optimizer gets to look inside it. It notices that it calls a function in the local scope, which it names k. GHC will often remove the k variable entirely, and will just replace it with the expression k reduces to. And then afterwards, if the function application is amenable to inlining, it can be inlined at this time - removing the overhead of calling a first-class function entirely.

Let's look at a simple, concrete example. This program will echo a line of input with all trailing 'x' characters removed:

dropR :: Char -> String -> String dropR x r = if x == 'x' && null r then "" else x : r  main :: IO () main = do     s <- getLine     putStrLn $ foldr dropR "" s 

First, the optimizer will inline foldr's definition and simplify, resulting in code that looks something like this:

main :: IO () main = do     s <- getLine     -- I'm changing the where clause to a let expression for the sake of readability     putStrLn $ let { go [] = ""; go (x:xs) = dropR x (go xs) } in go s 

And that's the thing the worker-wrapper transformation allows.. I'm going to skip the remaining steps, but it should be obvious that GHC can now inline the definition of dropR, eliminating the function call overhead. This is where the big performance win comes from.

like image 158
Carl Avatar answered Sep 21 '22 14:09

Carl


GHC cannot inline recursive functions, so

foldr            :: (a -> b -> b) -> b -> [a] -> b foldr _ z []     =  z foldr f z (x:xs) =  f x (foldr f z xs) 

cannot be inlined. But

foldr k z = go       where         go []     = z         go (y:ys) = y `k` go ys 

is not a recursive function. It is a non-recursive function with a local recursive definition!

This means that, as @bheklilr writes, in map (foldr (+) 0) the foldr can be inlined and hence f and z replaced by (+) and 0 in the new go, and great things can happen, such as unboxing of the intermediate value.

like image 22
Joachim Breitner Avatar answered Sep 21 '22 14:09

Joachim Breitner