Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Boehm GC work for C program?

I checked Boehm GC. The GC for C/C++.

I know mark-and-sweep algorithm. What I'm in curious is how it picks up only pointers in whole C memory. My understanding about C memory is just a plain byte array. Is it possible to determine a value in memory is pointer or not?

like image 299
eonil Avatar asked Jan 25 '11 16:01

eonil


People also ask

How does a garbage collector work in C?

Garbage Collection (GC) is a mechanism that provides automatic memory reclamation for unused memory blocks. Programmers dynamically allocate memory, but when a block is no longer needed, they do not have to return it to the system explicitly with a free() call.

Does C language have garbage collection?

C does not have automatic garbage collection. If you lose track of an object, you have what is known as a 'memory leak'. The memory will still be allocated to the program as a whole, but nothing will be able to use it if you've lost the last pointer to it. Memory resource management is a key requirement on C programs.

What does GC () do?

GC automatically releases memory when an object is no longer used. It does this by tracking how many names point to each object, and when there are no names pointing to an object, it deletes that object. Despite what you might have read elsewhere, there's never any need to call gc() yourself.

How does GC affect performance?

An application that spends 1% of its execution time on garbage collection will loose more than 20% throughput on a 32-processor system. If we increase the GC time to 2%, the overall throughput will drop by another 20%. Such is the impact of suspending 32 executing threads simultaneously!


1 Answers

The Boehm GC is a conservative collector, which means it assumes everything is a pointer. This means that it can find false positive references, like an integer which coincidentally has the value of an address in the heap. As a result, some blocks may stay in memory longer than they would with a non-conservative collector.

Here's a description from Boehm's page:

The garbage collector uses a modified mark-sweep algorithm. Conceptually it operates roughly in four phases, which are performed occasionally as part of a memory allocation:

  1. Preparation Each object has an associated mark bit. Clear all mark bits, indicating that all objects are potentially unreachable.
  2. Mark phase Marks all objects that can be reachable via chains of pointers from variables. Often the collector has no real information about the location of pointer variables in the heap, so it views all static data areas, stacks and registers as potentially containing pointers. Any bit patterns that represent addresses inside heap objects managed by the collector are viewed as pointers. Unless the client program has made heap object layout information available to the collector, any heap objects found to be reachable from variables are again scanned similarly.
  3. Sweep phase Scans the heap for inaccessible, and hence unmarked, objects, and returns them to an appropriate free list for reuse. This is not really a separate phase; even in non incremental mode this is operation is usually performed on demand during an allocation that discovers an empty free list. Thus the sweep phase is very unlikely to touch a page that would not have been touched shortly thereafter anyway.
  4. Finalization phase Unreachable objects which had been registered for finalization are enqueued for finalization outside the collector.

You should also know that the Boehm GC needs to be given a set of "roots", which are starting points for the mark-and-sweep algorithm. The stack and registers are automatically roots. You need to explicitly add global pointers as roots.


EDIT: In comments, some concerns were pointed out about conservative collectors in general. It is true that integers that look like heap pointers to the collector can cause memory not to be released. This is not as big of a problem as you might think. Most scalar integers in a program are used for counts and sizes and are fairly small (so they would not look like heap pointers). You would mostly run into problems with arrays containing bitmaps, strings, floating point data, or anything of that sort. Boehm GC lets you allocate a block with GC_MALLOC_ATOMIC which indicates to the collector that the block will not contain any pointers. If you look in gc_typed.h, you will also find ways to specify what parts of a block may contain pointers.

That said, a fundamental limitation of a conservative collector is that it cannot safely move memory around during collection since pointer rewriting is not safe. This means you won't get any of the benefits of compaction like lowered fragmentation and improved cache performance.

like image 112
Jay Conrod Avatar answered Sep 20 '22 12:09

Jay Conrod