The generic problem
Suppose you are coding a system that consists of a graph, plus graph rewrite rules that can be activated depending on the configuration of neighboring nodes. That is, you have a dynamic graph that grows/shrinks unpredictably during runtime. If you naively use malloc
, new nodes are going to be allocated in random positions in memory; after enough time, your heap will be a pointer spaghetti, giving you terrible cache efficiency. Is there any lightweight, incremental technique to make nodes that wire together stay close together in memory?
What I tried
The only thing I could think of is embedding the nodes in a cartesian space with some physical elastic simulation that repulsed/attracted nodes. That'd keep wired nodes together, but looks silly and I guess the overhead of the simulation would be bigger than the cache efficiency speedup.
The solid example
This is the system I'm trying to implement. This is a brief snippet of the code I'm trying to optimize in C. This repo is a prototypal, working implementation in JS, with terrible cache efficiency (and of the language itself). This video shows the system in action graphically.
What you are looking to solve is the Linear Arrangement Problem. Perfect solutions are considered to be NP-hard, but some good approximations exist. Here is a paper which should be a good place to start.
You might look at this in terms of halfspace garbage collection. This isn't hard to implement (I've done it for an interpreter), particularly since you're only doing it for fixed size node structures. Allocate from one large block (called a halfspace) of memory. When it gets too full or fragmented, stop and copy everything to the other (which you can also make bigger). The trick is updating all the pointers. For this there is a very elegant and efficient algorithm called scan copy. There's a nice discussion of it at Cornell. It essentially traverses the graph breadth first, copying as it goes, without any extra space other than what you're copying. A nice property of the algorithm is that breadth first levels end up adjacent after each copy. If this is a good enough level of locality, you'll get it very efficiently with this method.
If you're really concerned about the layout of memory, it might be worthwhile to manage it yourself.
You can malloc
a large block of memory at startup, then you allocate space out from that block. You'll need a separate structure to keep track of what has and what hasn't been allocated. If you know that all allocated structures are of a certain size that can simplify allocated/free space management, i.e. an array of indexes, otherwise you could use a linked list of pointers in the free space. Given that you'll likely be allocating structs one at a time, you probably don't need to worry about keeping track of the smallest and/or largest contiguous block of free space.
One thing you'll need to be careful of is alignment. Again, if you'll always be allocating memory in multiples of the size of a single struct, that makes things easier, otherwise it's probably a good idea to ensure that all allocations start at an 4 byte boundary, i.e. the difference between the address you allocate and the start address received from malloc
is a multiple of 4.
You can pass additional parameters to your custom allocation functions to give hints about where the block should be placed, such as the address of one or more nearby nodes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With