What's a space leak?

Tags:

I found the haskell wiki page on space leaks, which claims to list examples of real-world leaks, which it doesn't. It doesn't really say what a space leak is; it just links to the page for memory leaks.

425

asked Sep 01 '17 20:09

Filip Haglund

1 Answers

As noted in @Rasko's answer, a space leak refers to a situation where a program or specific computation uses more (usually much more) memory than is necessary for the computation and/or expected by the programmer.

Haskell programs tend to be particularly susceptible to space leaks, mostly because of the lazy evaluation model (sometimes complicated by the way IO interacts with this model) and the highly abstract nature of the language which can make it difficult for a programmer to determine exactly how a particular computation is likely to be performed.

It helps to consider a specific example. This Haskell program:

main = print $ sum [1..1000000000]

is an idiomatic way to sum the first billion integers. Compiled with -O2, it runs in a few seconds in constant memory (a few megabytes, basically the runtime overhead).

Now, any programmer would expect a program to sum the first billion integers should run without chewing up memory, but it's actually a little surprising that this Haskell version is well behaved. After all, read literally, it constructs a list of a billion integers before summing them up, so it ought to require at least a few gigabytes (just for storage for the billion integers, not to mention the overhead of a Haskell linked list).

However, lazy evaluation ensures that the list is only generated as it's needed and -- equally importantly -- optimizations performed by the compiler ensure that as list elements are added to the accumulating sum, the program recognizes they are no longer needed and allows them to be garbage collected instead of keeping them around until the end of the computation. So, at any point during the computation, only a sliding "window" into the middle of the list needs to be kept in memory -- earlier elements have been discarded, and later elements are yet to be lazily computed. (In fact, the optimizations go further than this: no list is even constructed, but this is far from obvious to the programmer.)

Soooo... Haskell programmers get used to the idea that tossing around giant (or even infinite) data structures will "just work" with computations automatically using only the memory they need.

But, a minor change to the program, like also printing the length of the list as proof of all the hard work we are doing:

main = let vals = [1..1000000000]
       in print (sum vals, length vals)

suddenly causes space usage to explode to dozens of gigabytes (or in the case of my laptop, to about 13Gigs before it starts swapping hopelessly and I kill it).

This is a space leak. Calculating the sum and length of this list are obviously things that can be done in constant space using a "sliding window" view into the list, but the above program uses much more memory than needed. The reason, it turns out, is that once the list has been given a name vals that's used in two places, the compiler no longer allows the "used" elements to be immediately discarded. If the sum vals is evaluated first, the list is lazily generated and summed, but the entire, giant list is then kept around until length vals can be evaluated.

As a more practical example, you might write a simple program to count words and characters in a file:

main = do txt <- getContents
          print (length txt, length (words txt))

This works fine on small test files up to a couple megabytes, but it's noticeably sluggish on 10meg file, and if you try to run it on a 100meg file, it'll slowly but surely start gobbling up all available memory. Again, the problem is that -- even though the file contents are read lazily into txt -- because txt is used twice, the entire contents are read into memory as a Haskell String type (a memory-inefficient representation of large blocks of text) when, say, length txt is evaluated, and none of that memory can be freed until length (words txt) has also been computed.

Note that:

main = do txt <- getContents
          print $ length txt

and:

main = do txt <- getContents
          print $ length (words txt)

both run quickly in constant space even on big files.

As a side note, fixing the above space leak normally involves rewriting the computation so the characters and words are counted with one pass through the contents, so the compiler can determine that the contents of the file that have already been processed do not need to be kept around in memory until the end of the computation. One possible solution is:

{-# LANGUAGE BangPatterns #-}

import Data.List
import Data.Char

charsWords :: String -> (Int, Int)
charsWords str = let (_, chrs, wrds) = foldl' step (False, 0, 0) str
                 in (chrs, wrds)
  where step (inWord, cs, ws) c =
          let !cs' = succ cs
              !ws' = if not inWord && inWord' then succ ws else ws
              !inWord' = not (isSpace c)
          in (inWord', cs', ws')

main = do txt <- getContents
          print $ charsWords txt

The complexity of this solution (use of bang (!) patterns and an explicit fold instead of length and words) illustrates how tough space leaks can be, especially for new Haskell programmers. And it's not at all obvious that using foldl' instead of foldl makes no difference (but using foldr or foldr' would be a disaster!), that the bangs before cs' and ws' are critical to avoid a space leak, but that the bang before inWord' isn't (though it slightly improves performance), etc.

175

answered Oct 11 '22 01:10

K. A. Buhr

Related questions
                            
                                How do I get the sums of the digits of a large number in Haskell?
                            
                                Tacit function composition in Haskell
                            
                                Can GHC warn if class instance is a loop?
                            
                                With monads, can join be defined in terms of bind?
                            
                                If I come from an imperative programming background, how do I wrap my head around the idea of no dynamic variables to keep track of things in Haskell?
                            
                                Load pure global variable from file
                            
                                (Num a) vs Integer type inference
                            
                                Why do We Need Sum Types?
                            
                                What is the best way to extract a diagonal from a matrix in Haskell?
                            
                                Example of non-trivial functors
                            
                                "Pattern matching" of algebraic type data constructors
                            
                                Fixed point combinator in Haskell
                            
                                is Haskell a managed language?
                            
                                Making a data type an instance of Show in Haskell
                            
                                Generate list of random values and also get a new generator
                            
                                Why the Haskell sequence function can't be lazy or why recursive monadic functions can't be lazy
                            
                                Partial Application with Infix Functions
                            
                                What is a Combinator in Haskell
                            
                                Haskell: What is the difference between (+1) and (\x->x+1)?
                            
                                What language is Haskell written in?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's a space leak?

Tags:

memory-leaks

haskell

Filip Haglund

People also ask

1 Answers

K. A. Buhr

Recent Activity

Donate For Us