Haskell foldl' poor performance with (++)

Question

I have this code:

import Data.List

newList_bad  lst = foldl' (\acc x -> acc ++ [x*2]) [] lst
newList_good lst = foldl' (\acc x -> x*2 : acc) [] lst

These functions return lists with each element multiplied by 2:

*Main> newList_bad [1..10]
[2,4,6,8,10,12,14,16,18,20]
*Main> newList_good [1..10]
[20,18,16,14,12,10,8,6,4,2]

In ghci:

*Main> sum $ newList_bad [1..15000]
225015000
(5.24 secs, 4767099960 bytes)
*Main> sum $ newList_good [1..15000]
225015000
(0.03 secs, 3190716 bytes)

Why newList_bad function works 200 times slower than newList_good? I understand that it's not a good solution for that task. But why this innocent code works so slow?

What is this "4767099960 bytes"?? For that simple an operation Haskell used 4 GiB??

After compilation:

C:\1>ghc -O --make test.hs
C:\1>test.exe
225015000
Time for sum (newList_bad [1..15000]) is 4.445889s
225015000
Time for sum (newList_good [1..15000]) is 0.0025005s

Will Ness · Accepted Answer

There is much confusion about this issue. The usual reason given is that "repeatedly appending at end of list requires repeated traversals of list and is thus O(n^2)". But it would only be so simple under strict evaluation. Under lazy evaluation everything is supposed to be delayed, so it begs the question whether there actually are these repeated traversals and appendings at all. The adding at end is triggered by consuming at front, and since we consume at front the list is getting shorter, so what is the exact timing of these actions is unclear. So the real answer is more subtle, and deals with specific reduction steps under lazy evaluation.

The immediate culprit is that foldl' only forces its accumulator argument to weak head normal form - i.e. until a non-strict constructor is exposed. The functions involved here are

(a:b)++c = a:(b++c)    -- does nothing with 'b', only pulls 'a' up
[]++c = c              -- so '++' only forces 1st elt from its left arg

foldl' f z [] = z
foldl' f z (x:xs) = let w=f z x in w `seq` foldl' f w xs

sum xs = sum_ xs 0     -- forces elts fom its arg one by one
sum_ [] a = a
sum_ (x:xs) a = sum_ xs (a+x)

and so actual reduction sequence is (with g = foldl' f)

sum $ foldl' (\acc x-> acc++[x^2]) []          [a,b,c,d,e]
sum $ g  []                                    [a,b,c,d,e]
      g  [a^2]                                   [b,c,d,e]
      g  (a^2:([]++[b^2]))                         [c,d,e]
      g  (a^2:(([]++[b^2])++[c^2]))                  [d,e]
      g  (a^2:((([]++[b^2])++[c^2])++[d^2]))           [e]
      g  (a^2:(((([]++[b^2])++[c^2])++[d^2])++[e^2]))   []
sum $ (a^2:(((([]++[b^2])++[c^2])++[d^2])++[e^2]))

Notice we've only performed O(n) steps so far. a^2 is immediately available here for the sum's consumption, but b^2 is not. We're left here with the left-nested structure of ++ expressions. The rest is best explained in this answer by Daniel Fischer. The gist of it is that to get b^2 out, O(n-1) steps will have to be performed - and the structure left in the wake of this access will still be left-nested, so the next access will take O(n-2) steps, and so on - the classic O(n^2) behavior. So the real reason is that ++ isn't forcing or rearranging its arguments enough to be efficient.

This is actually counter-intuitive. We could expect the lazy evaluation to magically "do it" for us here. After all we're only expressing out intent to add [x^2] to the end of list in the future, we don't actually do it right away. So the timing here is off, but it could be made right - as we access the list, new elements would be added into it and consumed right away, if the timing were right: if c^2 would be added into the list after b^2 (space-wise), say, just before (in time) b^2 would be consumed, the traversal/access would always be O(1).

This is achieved with so-called "difference-list" technique:

newlist_dl lst = foldl' (\z x-> (z . (x^2 :)) ) id lst

which, if you think of it for a moment, looks exactly the same as your ++[x^2] version. It expresses the same intent, and leaves left-nested structure too.

The difference, as explained in that same answer by Daniel Fischer, is that a (.) chain, when first forced, rearranges itself into a right-nested ($) structure¹ in O(n) steps, after which each access is O(1) and the timing of appendings is optimal exactly as described in the above paragraph, so we're left with overall O(n) behaviour.

¹ which is kind of magical, but it does happen. :)

Don Stewart · Answer

Classic list behavior.

Recall:

(:)  -- O(1) complexity
(++) -- O(n) complexity

So you are creating an O(n^2) algo, instead of an O(n) one.

For this common case of appending to lists incrementally, try using a dlist, or just reverse at the end.

Haskell foldl' poor performance with (++)

Tags:

performance

haskell

lazy-evaluation

strictness

weak-head-normal-form

user2083421

2 Answers

Will Ness

Don Stewart

Recent Activity

Donate For Us

Haskell foldl' poor performance with (++)

Tags:

performance

haskell

lazy-evaluation

strictness

weak-head-normal-form

user2083421

2 Answers

Will Ness

Don Stewart

Related questions

Recent Activity

Donate For Us