Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why no Reference Counting + Garbage Collection in C#?

I come from a C++ background and I've been working with C# for about a year. Like many others I'm flummoxed as to why deterministic resource management is not built-in to the language. Instead of deterministic destructors we have the dispose pattern. People start to wonder whether spreading the IDisposable cancer through their code is worth the effort.

In my C++-biased brain it seems like using reference-counted smart pointers with deterministic destructors is a major step up from a garbage collector that requires you to implement IDisposable and call dispose to clean up your non-memory resources. Admittedly, I'm not very smart... so I'm asking this purely from a desire to better understand why things are the way they are.

What if C# were modified such that:

Objects are reference counted. When an object's reference count goes to zero, a resource cleanup method is called deterministically on the object, then the object is marked for garbage collection. Garbage collection occurs at some non-deterministic time in the future at which point memory is reclaimed. In this scenario you don't have to implement IDisposable or remember to call Dispose. You just implement the resource cleanup function if you have non-memory resources to release.

  • Why is that a bad idea?
  • Would that defeat the purpose of the garbage collector?
  • Would it be feasible to implement such a thing?

EDIT: From the comments so far, this is a bad idea because

  1. GC is faster without reference counting
  2. problem of dealing with cycles in the object graph

I think number one is valid, but number two is easy to deal with using weak references.

So does the speed optimization outweigh the cons that you:

  1. may not free a non-memory resource in a timely manner
  2. might free a non-memory resource too soon

If your resource cleanup mechanism is deterministic and built-in to the language you can eliminate those possibilities.

like image 626
Skrymsli Avatar asked May 15 '09 05:05

Skrymsli


People also ask

Is reference counting garbage collection?

Reference counting. Reference counting garbage collection is where each object has a count of the number of references to it. Garbage is identified by having a reference count of zero. An object's reference count is incremented when a reference to it is created, and decremented when a reference is destroyed.

Why there is no garbage collector in C?

There are two reasons why C / C++ doesn't have garbage collection. It is "culturally inappropriate". The culture of these languages is to leave storage management to the programmer. It would be technically difficult (and expensive) to implement a precise garbage collector for C / C++.

What is the importance of garbage collection via reference counting?

Reference counting collectors keep track of how many references are pointing to each Java object. Once the count for an object becomes zero, the memory can be immediately reclaimed. This immediate access to reclaimed memory is the major advantage of the reference-counting approach to garbage collection.

Is reference counting faster than garbage collection?

Performance wise, if you ask Java developers they say garbage collection is faster; if you ask say Objective-C developers they say reference counting is faster. Studies prove what they want to prove. If it makes a difference, you should reduce the number of allocations, not switch languages.


2 Answers

Brad Abrams posted an e-mail from Brian Harry written during development of the .Net framework. It details many of the reasons reference counting was not used, even when one of the early priorities was to keep semantic equivalence with VB6, which uses reference counting. It looks into possibilities such as having some types ref counted and not others (IRefCounted!), or having specific instances ref counted, and why none of these solutions were deemed acceptable.

Because [the issue of resource management and deterministic finalization] is such a sensitive topic I am going to try to be as precise and complete in my explanation as I can. I apologize for the length of the mail. The first 90% of this mail is trying to convince you that the problem really is hard. In that last part, I'll talk about things we are trying to do but you need the first part to understand why we are looking at these options.

...

We initially started with the assumption that the solution would take the form of automatic ref counting (so the programmer couldn't forget) plus some other stuff to detect and handle cycles automatically. ...we ultimately concluded that this was not going to work in the general case.

...

In summary:

  • We feel that it is very important to solve the cycle problem without forcing programmers to understand, track down and design around these complex data structure problems.
  • We want to make sure we have a high performance (both speed and working set) system and our analysis shows that using reference counting for every single object in the system will not allow us to achieve this goal.
  • For a variety of reasons, including composition and casting issues, there is no simple transparent solution to having just those objects that need it be ref counted.
  • We chose not to select a solution that provides deterministic finalization for a single language/context because it inhibits interop with other languages and causes bifurcation of class libraries by creating language specific versions.
like image 133
Lucas Avatar answered Sep 17 '22 04:09

Lucas


The garbage collector does not require you to write a Dispose method for every class/type that you define. You only define one when you need to explicitly do something to cleanup ; when you have explicitly allocated native resources. Most of the time, the GC just reclaims memory even if you only do something like new() up an object.

The GC does reference counting - however it does it in a different way by finding which objects are 'reachable' (Ref Count > 0) every time it does a collection... it just doesn't do it in a integer counter way. . Unreachable objects are collected (Ref Count = 0). This way the runtime doesn't have to do housekeeping/updating tables everytime an object is assigned or released... should be faster.

The only major difference between C++ (deterministic) and C# (non-deterministic) is when the object would be cleaned up. You can't predict the exact moment an object would be collected in C#.

Umpteenth plug: I'd recommend reading Jeffrey Richter's standup chapter on the GC in CLR via C# in case you're really interested in how the GC works.

like image 24
Gishu Avatar answered Sep 17 '22 04:09

Gishu