Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the GC update references after compaction occurs

Tags:

The .NET Garbage Collector collects objects (reclaims their memory) and also performs memory compaction (to keep memory fragmentation to minimum).

I am wondering, since an application may have many references to objects, how does the GC (or the CLR) manage these references to objects, when the object's address changes due to compaction being made by the GC.

like image 293
lysergic-acid Avatar asked May 19 '12 22:05

lysergic-acid


People also ask

What happens during garbage collection?

A garbage collection has the following phases: A marking phase that finds and creates a list of all live objects. A relocating phase that updates the references to the objects that will be compacted. A compacting phase that reclaims the space occupied by the dead objects and compacts the surviving objects.

How does GC work in c#?

The garbage collector (GC) manages the allocation and release of memory. The garbage collector serves as an automatic memory manager. You do not need to know how to allocate and release memory or manage the lifetime of the objects that use that memory.

What is garbage collection cycle?

To prevent applications running out of memory, objects in the Java heap that are no longer required must be reclaimed. This process is known as garbage collection (GC).

When garbage collector runs in c#?

Garbage collection occurs when one of the following conditions is true: The system has low physical memory. The memory that is used by allocated objects on the managed heap surpasses an acceptable threshold. This threshold is continuously adjusted as the process runs.


2 Answers

The concept is simple enough, the garbage collector simply updates any object references and re-points them to the moved object.

Implementation is a bit trickier, there is no real difference between native and managed code, they are both machine code. And there's nothing special about an object reference, it is just a pointer at runtime. What's needed is a reliable way for the collector to find these pointers back and recognize them as the kind that reference a managed object. Not just to update them when the pointed-to object gets moved while compacting, also to recognize live references that ensure that an object does not get collected too soon.

That's simple for any object references that are stored in class objects that are stored on the GC heap, the CLR knows the layout of the object and which fields store a pointer. It is not so simple for object references stored on the stack or in a cpu register. Like local variables and method arguments.

The key property of executing managed code which makes it distinct from native code is that the CLR can reliably iterate the stack frames owned by managed code. Done by restricting the kind of code used to setup a stack frame. This is not typically possible in native code, the "frame pointer omission" optimization option is particularly nasty.

Stack frame walking first of all lets it finds object references stored on the stack. And lets it know that the thread is currently executing managed code so that the cpu registers should be checked for references as well. A transition from managed code to native code involves writing a special "cookie" on the stack that the collector recognizes. So it knows that any subsequent stack frames should not be checked because they'll contain random pointer values that don't ever reference a managed object.

You can see this back in the debugger when you enable unmanaged code debugging. Look at the Call Stack window and note the [Native to Managed Transition] and [Managed to Native Transition] annotations. That's the debugger recognizing those cookies. Important for it as well since it needs to know whether or not the Locals window can display anything meaningful. The stack walk is also exposed in the framework, note the StackTrace and StackFrame classes. And it is very important for sandboxing, Code Access Security (CAS) performs stack walks.

like image 186
Hans Passant Avatar answered Nov 21 '22 17:11

Hans Passant


For simplicity, I'll assume a stop-the-world GC in which no objects are pinned, every object gets scanned and relocated on every GC cycle, and none of the destinations overlap any of the sources. In actuality, the .NET GC is a bit more complicated, but this should give a good feel for how things work.

Each time a reference is examined, there are three possibilities:

  1. It's null. In that case, no action is required.

  2. It identifies an object whose header says it's something other than a relocation marker (a special kind of object described below). In that case, move the object to a new location and replace the original object with a three-word relocation marker containing the new location, the old location of the object which contains the just-observed reference to the present object, and the offset within that object. Then start scanning the new object (the system can forget about the object that was being scanned for the moment, since it just recorded its address).

  3. It identifies an object whose header says it's a relocation marker. In that case, update the reference being scanned to reflect the new address.

Once the system finishes scanning the present object, it can look at its old location to find out what it was doing before it started scanning the present object.

Once an object has been relocated, the former contents of its first three words will be available at its new location and will no longer be needed at the old one. Because the offset into an object will always be a multiple of four, and individual objects are limited to 2GB each, only a fraction of all possible 32-bit values would be needed to hold all possible offsets. Provided that at least one word in an object's header has at least 2^29 values it can never hold for anything other than an object-relocation marker, and provided every object is allocated at least twelve bytes, it's possible for object scanning to handle any depth of tree without requiring any depth-dependent storage outside the space occupied by old copies of objects whose content is no longer needed.

like image 44
supercat Avatar answered Nov 21 '22 15:11

supercat