Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any intermediate data structure created in list comprehensions

It seems like foldr does some kind of fusion with list comprehension, thus it requires less memory (11mb) allocation compared tofoldl (21mb) in this e.g.

myfunc = sum $ foldr g acc [ f x | x <- xs ]
f x = ..
g x y = ..

Could anyone explain how and why? Also how lazy evaluation help in this.

like image 988
vis Avatar asked Dec 09 '11 17:12

vis


3 Answers

A left fold cannot produce any output (part of the result) before it has traversed the entire list. Depending on what function you fold, that can build up a big data structure or a big thunk, which uses a lot of memory (it can run in constant memory if you fold e.g. (+) over a list of Int).

A right fold can, for appropriate functions (such that can produce a [partial] result without inspecting the second argument) produce their result incrementally, so that if the result is appropriately consumed and the input list is appropriately generated, the entire computation can run in small constant space. As sclv said, it reduces basically to a loop in those cases.

like image 65
Daniel Fischer Avatar answered Sep 19 '22 22:09

Daniel Fischer


We can desugar the comprehension as, essentially, map f xs. If you're compiling this then ghc should indeed be able to fuse the sum, the fold, and the map into a single pass: http://www.haskell.org/haskellwiki/Correctness_of_short_cut_fusion. But even if you're not, then laziness is your friend for memory usage. The list produced by the map is lazy -- f is only applied when demanded. And f will only be demanded when the foldr demands it. And since your foldr is clearly producing another (lazy) list, then each step of the fold is only demanded by sum in turn. So you still have each function applied in turn, but you don't need to produce the full intermediate data structures all at once. While you've written a wholemeal set of function compositions, the evaluation model will tend to treat this particular set of code, modulo a whole bunch of hand-waving, somewhat like a loop (though, without fusion, a loop with a fair amount of indirection).

like image 24
sclv Avatar answered Sep 20 '22 22:09

sclv


This is a feature of the GHC compiler. Basically, GHC can recognize when a list is used in a "pipeline", and can convert the whole construct into the equivalent of a while-loop in C that doesn't allocate a list at all.

The reason why this works with foldr and not foldl depends on the function g that you're using in your example. Since foldr, as opposed to foldl, accumulates the results of the function given as the parameter (aka: foldl needs the whole list before it can begin actually evaluating the function g, so it builds up a huge "thunk" of unevaluated functions and the final element in the list as its result - which is why it uses so much more memory in this case - while foldr can begin evaluating g as soon as it gets any list input), it is called "strict" in its accumulator, and certain assumptions can be made by the compiler that can lead to optimizations.

If, for instance, the function g yields a value that is a list, it can continue the aforementioned "pipeline" optimization strategy, basically treating the foldr like a map and making the whole construct (from list generation to list consumption) into a strict loop. This is only possible because the foldr yields exactly one list element for every list element it consumes, which foldl is not guaranteed to do (especially for infinite lists).

like image 20
dflemstr Avatar answered Sep 16 '22 22:09

dflemstr