How can you reduce garbage-collection time while using large data structures in a functional language?
(I'm using Racket but the question would apply to any functionally oriented language with a garbage collector.)
The core idiom of functional programming is that you design functions to work on a copy of the data, and return the processed data as a result, instead of mutating data structures from afar. You don't worry about the extra copying because a garbage collector comes by and recovers memory that's no longer in use.
That's great as far as it goes. But as I've started writing programs that handle larger data structures, I find that more of the total run time is being occupied by garbage collection (as in 25-35% rather than the 10-15% that I found typical with small structures).
That's not surprising, because more data is being copied for each function call, and thus there's more garbage to collect.
But it also makes speed improvements more difficult, since more of the run time is taken up by the garbage collector, which is essentially outside your control.
The obvious solution would be to avoid copying the data at all. But this puts you back into mutating data from afar, which contradicts the functional idiom. (Though I know this can be done to some degree in Racket by using boxed values, or even with parameters.)
Beyond that, three options occur to me:
Are these effective approaches? Are there others I'm overlooking?
There are functional data structures that are designed to reduce the cost of copying - for example, if a function mutates a branch of a tree, then the new tree would share the nodes from the unaffected branches and only the mutated branch would need to be copied.
Chris Okasaki's Purely Functional Data Structures is the best paper on this that I am aware of, but there is probably more recent research that I am unaware of (for example, the ctrie, that I only know of through Wikipedia).
This isn’t a perfect answer, but it’s more on-topic than other resources I found. This paper, “Functional Data Structures in Typed Racket,” discusses alternative functional data structures that perform better than standard lists in certain contexts, and gives specific timing results that includes garbage-collection time (HT Greg Hendershott):
http://www.ccs.neu.edu/racket/pubs/sfp10-kth.pdf
And here is the code for the implementations:
https://github.com/takikawa/tr-pfds
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With