I found the haskell wiki page on space leaks, which claims to list examples of real-world leaks, which it doesn't. It doesn't really say what a space leak is; it just links to the page for memory leaks.
What's a space leak?
A space leak occurs when a computer program uses more memory than necessary. Advanced language features such as lazy evaluation or closures lead to more complex memory layout, making it harder to predict what memory looks like, potentially leading to space leaks.
Research shows that healthy aging individuals have a very small leak in their blood-brain barrier. This leakage is associated with some measures of the benign forgetfulness of aging, considered by most scientists to be normal.
Memory leaks in the kernel level lead to serious system stability issues. Kernel memory is very limited compared to user land memory and should be handled cautiously. Memory is allocated but never freed.
Memory leaks don't result in physical or permanent damage. Since it's a software issue, it will slow down the applications or even your whole system. However, a program taking up a lot of RAM space doesn't always mean its memory is leaking somewhere. The program you're using may really need that much space.
As noted in @Rasko's answer, a space leak refers to a situation where a program or specific computation uses more (usually much more) memory than is necessary for the computation and/or expected by the programmer.
Haskell programs tend to be particularly susceptible to space leaks, mostly because of the lazy evaluation model (sometimes complicated by the way IO interacts with this model) and the highly abstract nature of the language which can make it difficult for a programmer to determine exactly how a particular computation is likely to be performed.
It helps to consider a specific example. This Haskell program:
main = print $ sum [1..1000000000]
is an idiomatic way to sum the first billion integers. Compiled with -O2
, it runs in a few seconds in constant memory (a few megabytes, basically the runtime overhead).
Now, any programmer would expect a program to sum the first billion integers should run without chewing up memory, but it's actually a little surprising that this Haskell version is well behaved. After all, read literally, it constructs a list of a billion integers before summing them up, so it ought to require at least a few gigabytes (just for storage for the billion integers, not to mention the overhead of a Haskell linked list).
However, lazy evaluation ensures that the list is only generated as it's needed and -- equally importantly -- optimizations performed by the compiler ensure that as list elements are added to the accumulating sum, the program recognizes they are no longer needed and allows them to be garbage collected instead of keeping them around until the end of the computation. So, at any point during the computation, only a sliding "window" into the middle of the list needs to be kept in memory -- earlier elements have been discarded, and later elements are yet to be lazily computed. (In fact, the optimizations go further than this: no list is even constructed, but this is far from obvious to the programmer.)
Soooo... Haskell programmers get used to the idea that tossing around giant (or even infinite) data structures will "just work" with computations automatically using only the memory they need.
But, a minor change to the program, like also printing the length of the list as proof of all the hard work we are doing:
main = let vals = [1..1000000000]
in print (sum vals, length vals)
suddenly causes space usage to explode to dozens of gigabytes (or in the case of my laptop, to about 13Gigs before it starts swapping hopelessly and I kill it).
This is a space leak. Calculating the sum and length of this list are obviously things that can be done in constant space using a "sliding window" view into the list, but the above program uses much more memory than needed. The reason, it turns out, is that once the list has been given a name vals
that's used in two places, the compiler no longer allows the "used" elements to be immediately discarded. If the sum vals
is evaluated first, the list is lazily generated and summed, but the entire, giant list is then kept around until length vals
can be evaluated.
As a more practical example, you might write a simple program to count words and characters in a file:
main = do txt <- getContents
print (length txt, length (words txt))
This works fine on small test files up to a couple megabytes, but it's noticeably sluggish on 10meg file, and if you try to run it on a 100meg file, it'll slowly but surely start gobbling up all available memory. Again, the problem is that -- even though the file contents are read lazily into txt
-- because txt
is used twice, the entire contents are read into memory as a Haskell String
type (a memory-inefficient representation of large blocks of text) when, say, length txt
is evaluated, and none of that memory can be freed until length (words txt)
has also been computed.
Note that:
main = do txt <- getContents
print $ length txt
and:
main = do txt <- getContents
print $ length (words txt)
both run quickly in constant space even on big files.
As a side note, fixing the above space leak normally involves rewriting the computation so the characters and words are counted with one pass through the contents, so the compiler can determine that the contents of the file that have already been processed do not need to be kept around in memory until the end of the computation. One possible solution is:
{-# LANGUAGE BangPatterns #-}
import Data.List
import Data.Char
charsWords :: String -> (Int, Int)
charsWords str = let (_, chrs, wrds) = foldl' step (False, 0, 0) str
in (chrs, wrds)
where step (inWord, cs, ws) c =
let !cs' = succ cs
!ws' = if not inWord && inWord' then succ ws else ws
!inWord' = not (isSpace c)
in (inWord', cs', ws')
main = do txt <- getContents
print $ charsWords txt
The complexity of this solution (use of bang (!
) patterns and an explicit fold instead of length
and words
) illustrates how tough space leaks can be, especially for new Haskell programmers. And it's not at all obvious that using foldl'
instead of foldl
makes no difference (but using foldr
or foldr'
would be a disaster!), that the bangs before cs'
and ws'
are critical to avoid a space leak, but that the bang before inWord'
isn't (though it slightly improves performance), etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With