Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Garbage collection in Haskell & parallel computations

Most of languages, using garbage collectors (possibly all of them), have one major issue related to parallel computations: garbage collector has to stop all running threads in order to delete unused objects. Haskell has garbage collector too. But, due to purity, Haskell guarantees that no computation changes the original data, it produces a copy instead, and then makes the changes. I suppose, in that way, GC does not has to stop all threads to do its job. I'm just curious: does Haskell have the same issue with garbage collection or not?

like image 951
Dmitry Bespalov Avatar asked Mar 03 '12 16:03

Dmitry Bespalov


People also ask

Why is Haskell garbage collected?

Haskell computations produce a lot of memory garbage - much more than conventional imperative languages. It's because data are immutable so the only way to store every next operation's result is to create new values. In particular, every iteration of a recursive computation creates a new value.

What is garbage collection in programming?

Garbage collection (GC) is a memory recovery feature built into programming languages such as C# and Java. A GC-enabled programming language includes one or more garbage collectors (GC engines) that automatically free up memory space that has been allocated to objects no longer needed by the program.

Why do functional languages need garbage collectors?

The fundamental challenge garbage collection addresses is freeing memory that is not, or is not known to be, used in a stack-like fashion. It is most especially useful when there is no clear place in the source code that can be pinpointed as the end of the object's lifetime.


1 Answers

GHC's garbage collector is parallel but not concurrent. That means that it can use all threads to perform garbage collection, but it has to stop all threads to do that. Concurrent garbage collection is much harder to implement (and usually has a larger performance overhead).

It is somewhat ironic that Haskell does indeed use lots of mutable objects, namely thunks (unevaluated expressions). Mutable objects cannot freely be duplicated (and even for immutable objects too much duplication should be kept in check).

For programs running on multiple cores having a truly concurrent collector would be nice, but you can also get decent benefits by doing heap-local garbage collection. The idea is that data that is not shared between multiple CPUs can be collected only by the owning CPU. This is typically the case for short-lived data. The Simons have done some recent work in this area. See their paper "Multicore Garbage Collection with Local Heaps" (PDF). This paper also shows some ways how you can exploit immutability in a way similar to what you propose.

Edit: I forgot to mention that Erlang basically does exactly what you propose. Each Erlang process has its own heap and sending a message copies the data from one process to the other. For this reason each Erlang process can do its own GC independently of all the other processes. (The downside is that Erlang doesn't give you shared memory concurrency.)

like image 69
nominolo Avatar answered Oct 17 '22 09:10

nominolo