I'm curious as to why Haskell implementations use a GC.
I can't think of a case where GC would be necessary in a pure language. Is it just an optimization to reduce copying, or is it actually necessary?
I'm looking for example code that would leak if a GC wasn't present.
The Haskell runtime system employs a generational garbage collector (GC) with two generations 2. Generations are numbered starting with the youngest generation at zero. Values are always allocated in a special part of the youngest generation called the nursery.
It is not strictly necessary. Given enough time and effort you can always translate a program that depends on garbage collection to one that doesn't.
Overview. Many programming languages require garbage collection, either as part of the language specification (e.g., RPL, Java, C#, D, Go, and most scripting languages) or effectively for practical implementation (e.g., formal languages like lambda calculus). These are said to be garbage-collected languages.
The fundamental challenge garbage collection addresses is freeing memory that is not, or is not known to be, used in a stack-like fashion. It is most especially useful when there is no clear place in the source code that can be pinpointed as the end of the object's lifetime.
As others have already pointed out, Haskell requires automatic, dynamic memory management: automatic memory management is necessary because manual memory management is unsafe; dynamic memory management is necessary because for some programs, the lifetime of an object can only be determined at runtime.
For example, consider the following program:
main = loop (Just [1..1000]) where
loop :: Maybe [Int] -> IO ()
loop obj = do
print obj
resp <- getLine
if resp == "clear"
then loop Nothing
else loop obj
In this program, the list [1..1000]
must be kept in memory until the user types "clear"; so the lifetime of this must be determined dynamically, and this is why dynamic memory management is necessary.
So in this sense, automated dynamic memory allocation is necessary, and in practice this means: yes, Haskell requires a garbage collector, since garbage collection is the highest-performance automatic dynamic memory manager.
Although a garbage collector is necessary, we might try to find some special cases where the compiler can use a cheaper memory management scheme than garbage collection. For instance, given
f :: Integer -> Integer
f x = let x2 = x*x in x2*x2
we might hope for the compiler to detect that x2
can safely be deallocated when f
returns (rather than waiting for the garbage collector to deallocate x2
). Essentially, we are asking that the compiler perform escape analysis to convert allocations in to garbage-collected heap to allocations on the stack wherever possible.
This is not too unreasonable to ask for: the jhc haskell compiler does this, although GHC does not. Simon Marlow says that GHC's generational garbage collector makes escape analysis mostly unnecessary.
jhc actually uses a sophisticated form of escape analysis known as region inference. Consider
f :: Integer -> (Integer, Integer)
f x = let x2 = x * x in (x2, x2+1)
g :: Integer -> Integer
g x = case f x of (y, z) -> y + z
In this case, a simplistic escape analysis would conclude that x2
escapes from f
(because it is returned in the tuple), and hence x2
must be allocated on the garbage-collected heap. Region inference, on the other hand, is able to detect that x2
can be deallocated when g
returns; the idea here is that x2
should be allocated in g
's region rather than f
's region.
While region inference is helpful in certain cases as discussed above, it appears to be difficult to reconcile effectively with lazy evaluation (see Edward Kmett's and Simon Peyton Jones' comments). For instance, consider
f :: Integer -> Integer
f n = product [1..n]
One might be tempted to allocate the list [1..n]
on the stack and deallocate it after f
returns, but this would be catastrophic: it would change f
from using O(1) memory (under garbage collection) to O(n) memory.
Extensive work was done in the 1990s and early 2000s on region inference for the strict functional language ML. Mads Tofte, Lars Birkedal, Martin Elsman, Niels Hallenberg have written a quite readable retrospective on their work on region inference, much of which they integrated into the MLKit compiler. They experimented with purely region-based memory management (i.e. no garbage collector) as well as hybrid region-based/garbage-collected memory management, and reported that their test programs ran "between 10 times faster and 4 times slower" than pure garbage-collected versions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With