Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing a Memory Manager in a managed language?

The Jikes RVM, a metacircular java runtime, seems to have its Allocator/Collector written in Java (MMTK).

How does it work, then, when your garbage collector requires a garbage collector to run? From looking at the code, I didn't see MMTK limiting itself to any particular subset of Java, but it seems to me that if your code which is meant to allocate managed memory is needs to allocate managed memory to run, it's going into a recursive spin until it blows up.

But clearly MMTK works, and apparently some other projects are using it too. How is writing a memory allocator and GC in a managed language like Java even possible?

like image 727
Li Haoyi Avatar asked Nov 12 '22 07:11

Li Haoyi


1 Answers

There's quite a few instances of this being done. I'm not familiar with Jikes' implementation, but I did read on both Java-based designs and PyPy. The thing most of them have in common is separating things into two levels: runtime/interpreter level and application level. Each level has its own objects, exceptions, execution flow, etc.

When you think about it abstractly, it's actually easier to understand. The runtime layer might have been written memory managed or not, low-level or high level, imperative or functional. That doesn't really matter. What matters is it's function, what it does. It's eventually translated into machine code and all machine code is the same to a machine. That code takes opcodes and data in as input, does predefined actions based on that, manages resources, etc.

The application layer is contained in those opcodes. The application layer represents another overall function that takes in input, processes it and performs output. These steps might also have a memory management routine built into them. The runtime doesn't care what the opcodes are doing so long as they're valid and it just executes them. In theory, you can write a GC language that contains a GC language that contains a GC language... and so on till you hit ENIAC speeds. ;)

I'd say the differentiator in designs like Jikes is their efficiency (and sometimes flexibility). It's not enough to write a GC, interpreter or runtime in a GC language. The result must be usable, which often means speeds. There might also need to be provisions to handle native code integration at application layer. Most of the real work goes into that stuff. Just running a GC in a GC language is the truly easy part: that's basically just labeling and managing objects at different abstraction levels.

Note: If you codesign the GC language and GC-based runtime, you can take advantage of this for performance and simplicity boosts. For instance, instead of having two full garbage collectors running, one on top of the other, you can have a smarter GC on bottom with a thin GC on top. The smarter GC knows difference between runtime and application level, making sure to operate on them separately and not do improper mixing. The thin top GC essentially just makes procedure calls to main GC or labels objects in a way main GC understands, but that doesn't add a ton of overhead.

Thing about these kind of projects is each one is a bit different. The nature of target and implementation languages dictate design choices a lot. Gives you plenty of opportunities to get creative with how you solve the problem.

like image 74
Nick P Avatar answered Dec 26 '22 09:12

Nick P