Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the .NET CLR distinguish between Managed from Unmanaged Pointers?

Everything is ultimately JITed into native machine code, so ultimately, we have a native stack in .NET which the GC needs to scan for object pointers whenever it does a garbage collection.

Now, the question is: How does the .NET garbage collector figure out if a pointer to an object inside the GC heap is actually a managed pointer or a random integer that happens to have a value that corresponds to a valid address?

Obviously, if it can't distinguish the two, then there can be memory leaks, so I'm wondering how it works. Or -- dare I say it -- does .NET have the potential to leak memory? :O

like image 573
user541686 Avatar asked Feb 23 '11 19:02

user541686


2 Answers

As others have pointed out, the GC knows precisely which fields of every block on the stack and the heap are managed references, because the GC and the jitter know the type of everything.

However, your point is well-taken. Imagine an entirely hypothetical world in which there are two kinds of memory management going on in the same process. For example, suppose you have an entirely hypothetical program called "InterMothra Chro-Nagava-Sploranator" written in C++ that uses traditional COM-style reference-counted memory management where everything is just a pointer to process memory, and objects are released by invoking a Release method the correct number of times. Suppose Sploranator hypothetically has a scripting language, JabbaScript, that maintains a garbage-collected pool of objects.

Trouble arises when a JabbaScript object has a reference to a non-managed Sploranator object, and that same Sploranator object has a reference right back. That's a circular reference that cannot be broken by the JabbaScript garbage collector, because it doesn't know about the memory layout of the Sploranator object. So there is the potential here for memory leaks.

One way to solve this problem is to rewrite the Sploranator memory manager so that it allocates its objects out of the managed GC pool.

Another way is to use a heuristic; the GC can dedicate a thread of a processor to scan all of memory looking for integers that happen to be pointers to its objects. That sounds like a lot, but it can omit pages that are uncommitted, pages in its own managed heap, pages that are known to contain only code, and so on. The GC can make a guess that if it thinks an object might be dead, and it cannot find any pointer to that object in any memory outside of its control, then the object is almost certainly dead.

The down side of this heuristic is of course that it can be wrong. You might have an integer that accidentally matches a pointer (though that is less likely in 64 bit land). That would extend the lifetime of the object. But who cares? We are already in the situation where circular references can extend the lifetimes of objects. We're trying to make that situation better, and this heuristic does so. That it is not perfect is irrelevant; it's better than nothing.

The other way it can be wrong is that Sploranator could have encoded the pointer, by, say, flipping all of its bits when storing the value and only flipping it back right before the call. If Sploranator is actively hostile to this GC heuristic strategy then it doesn't work.

Resemblance between the garbage collection strategy outlined here and the actual GC strategy of any product is almost entirely coincidental. Eric's musings about implementation details of garbage collectors of hypothetical non-existing products are for entertainment purposes only.

like image 115
Eric Lippert Avatar answered Sep 18 '22 02:09

Eric Lippert


The garbage collector doesn't need to infer whether a particular byte pattern (whether 4 or 8 bytes) is a pointer or not - it already knows.

In the CLR everything is strongly typed, so the garbage collector knows whether the bytes are an int, a long, an object reference, an untyped pointer, etc etc.

The layout of an object in memory is defined at compile type - metadata stored in the assembly gives the type and location of every member of the instance.

The layout of stack frames is similar - the JITter lays out the stack frame when the method is compiled, and keeps track of what kinds of data are stored where. (It's done by the JITter to allow for different optimizations depending on the capabilities of your processor).

When the garbage collector runs, it has access to all this metadata, so it never needs to guess whether a specific bit pattern might be a reference or not.

Eric Lippert's blog is a good place to find out more - References are not addresses would be a place to start.

like image 29
Bevan Avatar answered Sep 17 '22 02:09

Bevan