Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Struct memory hack to overlap object reference - Is it possible?

I'm guessing the answer to this is going to be "It's not possible, switch to C++". But I thought I'd throw it out there anyway.

I'm dealing with a massive binary tree. I've got an array of structs to represent the branch nodes that I use to help with locality of memory when iterating through the tree.

To save a bit of memory, and therefore improve cache locality I'm looking at overlapping an object reference for leaf nodes. That object reference will point to all the leaf data. Basically, something like this:

[StructLayout(LayoutKind.Explicit)]
struct BranchData
{
    [FieldOffset(0)] // 1 byte
    internal byte SplitIndex;
    [FieldOffset(1)] // 4 bytes
    internal float SplitValue;
    [FieldOffset(5)] // 4 bytes
    internal int LowIndex;
    [FieldOffset(9)] // 4 bytes
    internal int HighIndex;
    [FieldOffset(0)] // 8 bytes (We're working with x64 here)
    internal LeafData Node;
}

The above gives the following runtime error

Could not load type 'BranchData' from assembly 'WindowsFormsApplication1, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null' because it contains an object field at offset 0 that is incorrectly aligned or overlapped by a non-object field.

I could use a separate array to store the leaf data, and use the indexes to point to that array, but then I have 2 memory lookups (for what are certainly distant areas of memory). One for the location in the leaf array to get the reference, and one to get the leaf data. If I can achieve this overlap, I get rid of one of those lookups.

I am able to pin objects and use unsafe code to solve this problem. Speed is the key element here.

like image 349
Will Calderwood Avatar asked Jul 21 '13 11:07

Will Calderwood


1 Answers

This restriction is very fundamental in managed code. The problem is that your Node member is an object reference. A pointer at runtime. It is overlapped by the other fields.

The garbage collector needs to be able to find that pointer back. Necessary both to know that there's a live reference to the LeafData object on the heap. And to update that pointer when the LeafData object is moved when the heap is compacted.

Problem is: there's no way that the collector can tell whether your union stores that pointer. If it doesn't then there's a risk that the values of the other members will look like a valid object reference to the GC. And that's very, very bad.

Storing an unsafe LeafData* is technically possible but that requires the LeafData object to be pinned. That just can't work when the tree is big, the GC falls over when nothing can be moved anymore. Storing the LeafData data in unmanaged memory is further down the rabbit hole, you are starting to write C++ code by then. The only other thing you could do is store the LeafData in the node itself, as a struct, pretty unlikely you'll be happy with the fit.

Do note that you should avoid these misaligned fields, you'll get slammed pretty hard when a field spans an L1 cache line boundary. Put the SplitIndex after HighIndex so this can't happen.

like image 165
Hans Passant Avatar answered Nov 07 '22 09:11

Hans Passant