Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reducing garbage-collection time while using large data structures in a functional language

How can you reduce garbage-collection time while using large data structures in a functional language?

(I'm using Racket but the question would apply to any functionally oriented language with a garbage collector.)

The core idiom of functional programming is that you design functions to work on a copy of the data, and return the processed data as a result, instead of mutating data structures from afar. You don't worry about the extra copying because a garbage collector comes by and recovers memory that's no longer in use.

That's great as far as it goes. But as I've started writing programs that handle larger data structures, I find that more of the total run time is being occupied by garbage collection (as in 25-35% rather than the 10-15% that I found typical with small structures).

That's not surprising, because more data is being copied for each function call, and thus there's more garbage to collect.

But it also makes speed improvements more difficult, since more of the run time is taken up by the garbage collector, which is essentially outside your control.

The obvious solution would be to avoid copying the data at all. But this puts you back into mutating data from afar, which contradicts the functional idiom. (Though I know this can be done to some degree in Racket by using boxed values, or even with parameters.)

Beyond that, three options occur to me:

  1. Design your data structures so that they encode information more compactly.
  2. Rather than passing whole data structures to functions, pull out the subset of data you need to process (though of course that assumes that cost of separating & rejoining the data subset saves enough garbage-collection time to be worth it).
  3. Where possible, combine multiple functions (which would ordinarily pass the large data structure from one to the next, copying it each time) into a single, tail-recursive function (which will not).

Are these effective approaches? Are there others I'm overlooking?

like image 392
Matthew Butterick Avatar asked Apr 06 '26 20:04

Matthew Butterick


2 Answers

There are functional data structures that are designed to reduce the cost of copying - for example, if a function mutates a branch of a tree, then the new tree would share the nodes from the unaffected branches and only the mutated branch would need to be copied.

Chris Okasaki's Purely Functional Data Structures is the best paper on this that I am aware of, but there is probably more recent research that I am unaware of (for example, the ctrie, that I only know of through Wikipedia).

like image 67
Zim-Zam O'Pootertoot Avatar answered Apr 08 '26 14:04

Zim-Zam O'Pootertoot


This isn’t a perfect answer, but it’s more on-topic than other resources I found. This paper, “Functional Data Structures in Typed Racket,” discusses alternative functional data structures that perform better than standard lists in certain contexts, and gives specific timing results that includes garbage-collection time (HT Greg Hendershott):

http://www.ccs.neu.edu/racket/pubs/sfp10-kth.pdf

And here is the code for the implementations:

https://github.com/takikawa/tr-pfds

like image 36
Matthew Butterick Avatar answered Apr 08 '26 15:04

Matthew Butterick



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!