I wanted to test foldl vs foldr. From what I've seen you should use foldl over foldr when ever you can due to tail reccursion optimization. This makes sense. However, after running this test I am confused: foldr (takes 0.057s when using time command): <pre class="prettyprint"><code>a::a -> [a] -> [a] a x = ([x] ++ ) main = putStrLn(show ( sum (foldr a [] [0.. 100000]))) </code></pre> foldl (takes 0.089s when using time command): <pre class="prettyprint"><code>b::[b] -> b -> [b] b xs = ( ++ xs). (\y->[y]) main = putStrLn(show ( sum (foldl b [] [0.. 100000]))) </code></pre> It's clear that this example is trivial, but I am confused as to why foldr is beating foldl. Shouldn't this be a clear case where foldl wins?

Welcome to the world of lazy evaluation. When you think about it in terms of strict evaluation, foldl looks "good" and foldr looks "bad" because foldl is tail recursive, but foldr would have to build a tower in the stack so it can process the last item first. However, lazy evaluation turns the tables. Take, for example, the definition of the map function: <pre class="prettyprint"><code>map :: (a -> b) -> [a] -> [b] map _ [] = [] map f (x:xs) = f x : map f xs </code></pre> This wouldn't be too good if Haskell used strict evaluation, since it would have to compute the tail first, then prepend the item (for all items in the list). The only way to do it efficiently would be to build the elements in reverse, it seems. However, thanks to Haskell's lazy evaluation, this map function is actually efficient. Lists in Haskell can be thought of as generators, and this map function generates its first item by applying f to the first item of the input list. When it needs a second item, it just does the same thing again (without using extra space). It turns out that <code>map</code> can be described in terms of <code>foldr</code>: <pre class="prettyprint"><code>map f xs = foldr (\x ys -> f x : ys) [] xs </code></pre> It's hard to tell by looking at it, but lazy evaluation kicks in because foldr can give <code>f</code> its first argument right away: <pre class="prettyprint"><code>foldr f z [] = z foldr f z (x:xs) = f x (foldr f z xs) </code></pre> Because the <code>f</code> defined by <code>map</code> can return the first item of the result list using solely the first parameter, the fold can operate lazily in constant space. Now, lazy evaluation does bite back. For instance, try running sum [1..1000000]. It yields a stack overflow. Why should it? It should just evaluate from left to right, right? Let's look at how Haskell evaluates it: <pre class="prettyprint"><code>foldl f z [] = z foldl f z (x:xs) = foldl f (f z x) xs sum = foldl (+) 0 sum [1..1000000] = foldl (+) 0 [1..1000000] = foldl (+) ((+) 0 1) [2..1000000] = foldl (+) ((+) ((+) 0 1) 2) [3..1000000] = foldl (+) ((+) ((+) ((+) 0 1) 2) 3) [4..1000000] ... = (+) ((+) ((+) (...) 999999) 1000000) </code></pre> Haskell is too lazy to perform the additions as it goes. Instead, it ends up with a tower of unevaluated thunks that have to be forced to get a number. The stack overflow occurs during this evaluation, since it has to recurse deeply to evaluate all the thunks. Fortunately, there is a special function in Data.List called <code>foldl'</code> that operates strictly. <code>foldl' (+) 0 [1..1000000]</code> will not stack overflow. (Note: I tried replacing <code>foldl</code> with <code>foldl'</code> in your test, but it actually made it run slower.)

EDIT: Upon looking at this problem again, I think all current explanations are somewhat insufficient so I've written a longer explanation. The difference is in how <code>foldl</code> and <code>foldr</code> apply their reduction function. Looking at the <code>foldr</code> case, we can expand it as <pre class="prettyprint"><code>foldr (\x -> [x] ++ ) [] [0..10000] [0] ++ foldr a [] [1..10000] [0] ++ ([1] ++ foldr a [] [2..10000]) ... </code></pre> This list is processed by <code>sum</code>, which consumes it as follows: <pre class="prettyprint"><code>sum = foldl' (+) 0 foldl' (+) 0 ([0] ++ ([1] ++ ... ++ [10000])) foldl' (+) 0 (0 : [1] ++ ... ++ [10000]) -- get head of list from '++' definition foldl' (+) 0 ([1] ++ [2] ++ ... ++ [10000]) -- add accumulator and head of list foldl' (+) 0 (1 : [2] ++ ... ++ [10000]) foldl' (+) 1 ([2] ++ ... ++ [10000]) ... </code></pre> I've left out the details of the list concatenation, but this is how the reduction proceeds. The important part is that everything gets processed in order to minimize list traversals. The <code>foldr</code> only traverses the list once, the concatenations don't require continuous list traversals, and <code>sum</code> finally consumes the list in one pass. Critically, the head of the list is available from <code>foldr</code> immediately to <code>sum</code>, so <code>sum</code> can begin working immediately and values can be gc'd as they are generated. With fusion frameworks such as <code>vector</code>, even the intermediate lists will likely be fused away. Contrast this to the <code>foldl</code> function: <pre class="prettyprint"><code>b xs = ( ++xs) . (\y->[y]) foldl b [] [0..10000] foldl b ( [0] ++ [] ) [1..10000] foldl b ( [1] ++ ([0] ++ []) ) [2..10000] foldl b ( [2] ++ ([1] ++ ([0] ++ [])) ) [3..10000] ... </code></pre> Note that now the head of the list isn't available until <code>foldl</code> has finished. This means that the entire list must be constructed in memory before <code>sum</code> can begin to work. This is much less efficient overall. Running the two versions with <code>+RTS -s</code> shows miserable garbage collection performance from the foldl version. This is also a case where <code>foldl'</code> will not help. The added strictness of <code>foldl'</code> doesn't change the way the intermediate list is created. The head of the list remains unavailable until foldl' has finished, so the result will still be slower than with <code>foldr</code>. I use the following rule to determine the best choice of <code>fold</code> <ul> <li>For folds that are a reduction, use <code>foldl'</code> (e.g. this will be the only/final traversal)</li> <li>Otherwise use <code>foldr</code>.</li> <li>Don't use <code>foldl</code>.</li> </ul> In most cases <code>foldr</code> is the best fold function because the traversal direction is optimal for lazy evaluation of lists. It's also the only one capable of processing infinite lists. The extra strictness of <code>foldl'</code> can make it faster in some cases, but this is dependent on how you'll use that structure and how lazy it is.

I don't think anyone's actually said the real answer on this one yet, unless I'm missing something (which may well be true and welcomed with downvotes). I think the biggest different in this case is that <code>foldr</code> builds the list like this: [0] ++ ([1] ++ ([2] ++ (... ++ [1000000]))) Whereas <code>foldl</code> builds the list like this: ((([0] ++ [1]) ++ [2]) ++ ... ) ++ [999888]) ++ [999999]) ++ [1000000] The difference in subtle, but notice that in the <code>foldr</code> version <code>++</code> always has only one list element as its left argument. With the <code>foldl</code> version, there are up to 999999 elements in <code>++</code>'s left argument (on average around 500000), but only one element in the right argument. However, <code>++</code> takes time proportional to the size of the left argument, as it has to look though the entire left argument list to the end and then repoint that last element to the first element of the right argument (at best, perhaps it actually needs to do a copy). The right argument list is unchanged, so it doesn't matter how big it is. That's why the <code>foldl</code> version is much slower. It's got nothing to do with laziness in my opinion.

For a, the <code>[0.. 100000]</code> list needs to be expanded right away so that foldr can start with the last element. Then as it folds things together, the intermediate results are <pre class="prettyprint"><code>[100000] [99999, 100000] [99998, 99999, 100000] ... [0.. 100000] -- i.e., the original list </code></pre> Because nobody is allowed to change this list value (Haskell is a pure functional language), the compiler is free to reuse the value. The intermediate values, like <code>[99999, 100000]</code> can even be simply pointers into the expanded <code>[0.. 100000]</code> list instead of separate lists. For b, look at the intermediate values: <pre class="prettyprint"><code>[0] [0, 1] [0, 1, 2] ... [0, 1, ..., 99999] [0.. 100000] </code></pre> Each of those intermediate lists can't be reused, because if you change the end of the list then you've changed any other values that point to it. So you're creating a bunch of extra lists that take time to build in memory. So in this case you spend a lot more time allocating and filling in these lists that are intermediate values. Since you're just making a copy of the list, a runs faster because it starts by expanding the full list and then just keeps moving a pointer from the back of the list to the front.

Neither <code>foldl</code> nor <code>foldr</code> is tail optimized. It is only <code>foldl'</code>. But in your case using <code>++</code> with <code>foldl'</code> is not good idea because successive evaluation of <code>++</code> will cause traversing growing accumulator again and again.

foldl is tail recursive, so how come foldr runs faster than foldl?

Tags:

optimization

tail-recursion

haskell

combinators

fold

I wanted to test foldl vs foldr. From what I've seen you should use foldl over foldr when ever you can due to tail reccursion optimization.

This makes sense. However, after running this test I am confused:

foldr (takes 0.057s when using time command):

a::a -> [a] -> [a]
a x = ([x] ++ )

main = putStrLn(show ( sum (foldr a [] [0.. 100000])))

foldl (takes 0.089s when using time command):

b::[b] -> b -> [b]
b xs = ( ++ xs). (\y->[y])

main = putStrLn(show ( sum (foldl b [] [0.. 100000])))

It's clear that this example is trivial, but I am confused as to why foldr is beating foldl. Shouldn't this be a clear case where foldl wins?

371

asked Aug 07 '10 07:08

Ori

6 Answers

Welcome to the world of lazy evaluation.

When you think about it in terms of strict evaluation, foldl looks "good" and foldr looks "bad" because foldl is tail recursive, but foldr would have to build a tower in the stack so it can process the last item first.

However, lazy evaluation turns the tables. Take, for example, the definition of the map function:

map :: (a -> b) -> [a] -> [b]
map _ []     = []
map f (x:xs) = f x : map f xs

This wouldn't be too good if Haskell used strict evaluation, since it would have to compute the tail first, then prepend the item (for all items in the list). The only way to do it efficiently would be to build the elements in reverse, it seems.

However, thanks to Haskell's lazy evaluation, this map function is actually efficient. Lists in Haskell can be thought of as generators, and this map function generates its first item by applying f to the first item of the input list. When it needs a second item, it just does the same thing again (without using extra space).

It turns out that map can be described in terms of foldr:

map f xs = foldr (\x ys -> f x : ys) [] xs

It's hard to tell by looking at it, but lazy evaluation kicks in because foldr can give f its first argument right away:

foldr f z []     = z
foldr f z (x:xs) = f x (foldr f z xs)

Because the f defined by map can return the first item of the result list using solely the first parameter, the fold can operate lazily in constant space.

Now, lazy evaluation does bite back. For instance, try running sum [1..1000000]. It yields a stack overflow. Why should it? It should just evaluate from left to right, right?

Let's look at how Haskell evaluates it:

foldl f z []     = z
foldl f z (x:xs) = foldl f (f z x) xs

sum = foldl (+) 0

sum [1..1000000] = foldl (+) 0 [1..1000000]
                 = foldl (+) ((+) 0 1) [2..1000000]
                 = foldl (+) ((+) ((+) 0 1) 2) [3..1000000]
                 = foldl (+) ((+) ((+) ((+) 0 1) 2) 3) [4..1000000]
                   ...
                 = (+) ((+) ((+) (...) 999999) 1000000)

Haskell is too lazy to perform the additions as it goes. Instead, it ends up with a tower of unevaluated thunks that have to be forced to get a number. The stack overflow occurs during this evaluation, since it has to recurse deeply to evaluate all the thunks.

Fortunately, there is a special function in Data.List called foldl' that operates strictly. foldl' (+) 0 [1..1000000] will not stack overflow. (Note: I tried replacing foldl with foldl' in your test, but it actually made it run slower.)

168

answered Oct 12 '22 00:10

Joey Adams

EDIT: Upon looking at this problem again, I think all current explanations are somewhat insufficient so I've written a longer explanation.

The difference is in how foldl and foldr apply their reduction function. Looking at the foldr case, we can expand it as

foldr (\x -> [x] ++ ) [] [0..10000]
[0] ++ foldr a [] [1..10000]
[0] ++ ([1] ++ foldr a [] [2..10000])
...

This list is processed by sum, which consumes it as follows:

sum = foldl' (+) 0
foldl' (+) 0 ([0] ++ ([1] ++ ... ++ [10000]))
foldl' (+) 0 (0 : [1] ++ ... ++ [10000])     -- get head of list from '++' definition
foldl' (+) 0 ([1] ++ [2] ++ ... ++ [10000])  -- add accumulator and head of list
foldl' (+) 0 (1 : [2] ++ ... ++ [10000])
foldl' (+) 1 ([2] ++ ... ++ [10000])
...

I've left out the details of the list concatenation, but this is how the reduction proceeds. The important part is that everything gets processed in order to minimize list traversals. The foldr only traverses the list once, the concatenations don't require continuous list traversals, and sum finally consumes the list in one pass. Critically, the head of the list is available from foldr immediately to sum, so sum can begin working immediately and values can be gc'd as they are generated. With fusion frameworks such as vector, even the intermediate lists will likely be fused away.

Contrast this to the foldl function:

b xs = ( ++xs) . (\y->[y])
foldl b [] [0..10000]
foldl b ( [0] ++ [] ) [1..10000]
foldl b ( [1] ++ ([0] ++ []) ) [2..10000]
foldl b ( [2] ++ ([1] ++ ([0] ++ [])) ) [3..10000]
...

Note that now the head of the list isn't available until foldl has finished. This means that the entire list must be constructed in memory before sum can begin to work. This is much less efficient overall. Running the two versions with +RTS -s shows miserable garbage collection performance from the foldl version.

This is also a case where foldl' will not help. The added strictness of foldl' doesn't change the way the intermediate list is created. The head of the list remains unavailable until foldl' has finished, so the result will still be slower than with foldr.

I use the following rule to determine the best choice of fold

For folds that are a reduction, use foldl' (e.g. this will be the only/final traversal)
Otherwise use foldr.
Don't use foldl.

In most cases foldr is the best fold function because the traversal direction is optimal for lazy evaluation of lists. It's also the only one capable of processing infinite lists. The extra strictness of foldl' can make it faster in some cases, but this is dependent on how you'll use that structure and how lazy it is.

answered Oct 11 '22 23:10

John L

I don't think anyone's actually said the real answer on this one yet, unless I'm missing something (which may well be true and welcomed with downvotes).

I think the biggest different in this case is that foldr builds the list like this:

[0] ++ ([1] ++ ([2] ++ (... ++ [1000000])))

Whereas foldl builds the list like this:

((([0] ++ [1]) ++ [2]) ++ ... ) ++ [999888]) ++ [999999]) ++ [1000000]

The difference in subtle, but notice that in the foldr version ++ always has only one list element as its left argument. With the foldl version, there are up to 999999 elements in ++'s left argument (on average around 500000), but only one element in the right argument.

However, ++ takes time proportional to the size of the left argument, as it has to look though the entire left argument list to the end and then repoint that last element to the first element of the right argument (at best, perhaps it actually needs to do a copy). The right argument list is unchanged, so it doesn't matter how big it is.

That's why the foldl version is much slower. It's got nothing to do with laziness in my opinion.

answered Oct 12 '22 00:10

Clinton

The problem is that tail recursion optimization is a memory optimization, not a execution time optimization!

Tail recursion optimization avoids the need to remember values for each recursive call.

So, foldl is in fact "good" and foldr is "bad".

For example, considering the definitions of foldr and foldl:

foldl f z [] = z
foldl f z (x:xs) = foldl f (z `f` x) xs

foldr f z [] = z
foldr f z (x:xs) = x `f` (foldr f z xs)

That's how the expression "foldl (+) 0 [1,2,3]" is evaluated:

foldl (+) 0 [1, 2, 3]
foldl (+) (0+1) [2, 3]
foldl (+) ((0+1)+2) [3]
foldl (+) (((0+1)+2)+3) [ ]
(((0+1)+2)+3)
((1+2)+3)
(3+3)
6

Note that foldl doesn't remember the values 0, 1, 2..., but pass the whole expression (((0+1)+2)+3) as argument lazily and don't evaluates it until the last evaluation of foldl, where it reaches the base case and returns the value passed as the second parameter (z) wich isn't evaluated yet.

On the other hand, that's how foldr works:

foldr (+) 0 [1, 2, 3]
1 + (foldr (+) 0 [2, 3])
1 + (2 + (foldr (+) 0 [3]))
1 + (2 + (3 + (foldr (+) 0 [])))
1 + (2 + (3 + 0)))
1 + (2 + 3)
1 + 5
6

The important difference here is that where foldl evaluates the whole expression in the last call, avoiding the need to come back to reach remembered values, foldr no. foldr remember one integer for each call and performs a addition in each call.

Is important to bear in mind that foldr and foldl are not always equivalents. For instance, try to compute this expressions in hugs:

foldr (&&) True (False:(repeat True))

foldl (&&) True (False:(repeat True))

foldr and foldl are equivalent only under certain conditions described here

(sorry for my bad english)

answered Oct 12 '22 00:10

matiascelasco

For a, the [0.. 100000] list needs to be expanded right away so that foldr can start with the last element. Then as it folds things together, the intermediate results are

[100000]
[99999, 100000]
[99998, 99999, 100000]
...
[0.. 100000] -- i.e., the original list

Because nobody is allowed to change this list value (Haskell is a pure functional language), the compiler is free to reuse the value. The intermediate values, like [99999, 100000] can even be simply pointers into the expanded [0.. 100000] list instead of separate lists.

For b, look at the intermediate values:

[0]
[0, 1]
[0, 1, 2]
...
[0, 1, ..., 99999]
[0.. 100000]

Each of those intermediate lists can't be reused, because if you change the end of the list then you've changed any other values that point to it. So you're creating a bunch of extra lists that take time to build in memory. So in this case you spend a lot more time allocating and filling in these lists that are intermediate values.

Since you're just making a copy of the list, a runs faster because it starts by expanding the full list and then just keeps moving a pointer from the back of the list to the front.

answered Oct 11 '22 23:10

Harold L

Neither foldl nor foldr is tail optimized. It is only foldl'.

But in your case using ++ with foldl' is not good idea because successive evaluation of ++ will cause traversing growing accumulator again and again.

answered Oct 12 '22 00:10

Hynek -Pichi- Vychodil

Related questions
                            
                                Is my Android App Draining Battery?
                            
                                Does the <script> tag position in HTML affects performance of the webpage?
                            
                                Can compiler optimization introduce bugs?
                            
                                GROUP BY having MAX date
                            
                                Which is faster: clear collection or instantiate new
                            
                                How to find rows in one table that have no corresponding row in another table
                            
                                Loop with a zero execution time
                            
                                How to prevent GCC from optimizing out a busy wait loop?
                            
                                > vs. >= in bubble sort causes significant performance difference
                            
                                best way to clear contents of .NET's StringBuilder
                            
                                Tips for optimizing C#/.NET programs [closed]
                            
                                Where do I find a standard Trie based map implementation in Java? [closed]
                            
                                What is the "cost" of .NET reflection? [duplicate]
                            
                                How to use profile guided optimizations in g++?
                            
                                What does "Optimize Code" option really do in Visual Studio?
                            
                                Speeding up Julia's poorly written R examples
                            
                                Measuring actual MySQL query time
                            
                                Optimizing Kohana-based Websites for Speed and Scalability
                            
                                In ArrayBlockingQueue, why copy final member field into local final variable?
                            
                                Why would code actively try to prevent tail-call optimization?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

foldl is tail recursive, so how come foldr runs faster than foldl?

Tags:

optimization

tail-recursion

haskell

combinators

fold

Ori

People also ask

6 Answers

Joey Adams

John L

Clinton

matiascelasco

Harold L

Hynek -Pichi- Vychodil

Recent Activity

Donate For Us