Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Apple's Objective-C runtime do multithreaded reference counting without degraded performance?

So I was reading this article about an attempt to remove the global interpreter lock (GIL) from the Python interpreter to improve multithreading performance and saw something interesting.

It turns out that one of the places where removing the GIL actually made things worse was in memory management:

With free-threading, reference counting operations lose their thread-safety. Thus, the patch introduces a global reference-counting mutex lock along with atomic operations for updating the count. On Unix, locking is implemented using a standard pthread_mutex_t lock (wrapped inside a PyMutex structure) and the following functions...

...On Unix, it must be emphasized that simple reference count manipulation has been replaced by no fewer than three function calls, plus the overhead of the actual locking. It's far more expensive...

...Clearly fine-grained locking of reference counts is the major culprit behind the poor performance, but even if you take away the locking, the reference counting performance is still very sensitive to any kind of extra overhead (e.g., function call, etc.). In this case, the performance is still about twice as slow as Python with the GIL.

and later:

Reference counting is a really lousy memory-management technique for free-threading. This was already widely known, but the performance numbers put a more concrete figure on it. This will definitely be the most challenging issue for anyone attempting a GIL removal patch.

So the question is, if reference counting is so lousy for threading, how does Objective-C do it? I've written multithreaded Objective-C apps, and haven't noticed much of an overhead for memory management. Are they doing something else? Like some kind of per object lock instead of a global one? Is Objective-C's reference counting actually technically unsafe with threads? I'm not enough of a concurrency expert to really speculate much, but I'd be interested in knowing.

like image 266
Mike Akers Avatar asked Dec 18 '12 22:12

Mike Akers


People also ask

What does the Objective-C runtime do?

The Objective-C runtime is a runtime library that provides support for the dynamic properties of the Objective-C language, and as such is linked to by all Objective-C apps. Objective-C runtime library support functions are implemented in the shared library found at /usr/lib/libobjc.

What is multithreading in Objective-C?

The most simple way to create a thread is by calling a selector "in the background". This means a new thread is created to execute the selector. The receiving object can be any object, not just self , but it needs to respond to the given selector.

How does ARC work in Objective-C and how is it different from garbage collection?

ARC differs from tracing garbage collection in that there is no background process that deallocates the objects asynchronously at runtime. Unlike tracing garbage collection, ARC does not handle reference cycles automatically.


1 Answers

There is overhead and it can be significant in rare cases (like, for example, micro-benchmarks ;), regardless of the optimizations that are in place (of which, there are many). The normal case, though, is optimized for un-contended manipulation of the reference count for the object.

So the question is, if reference counting is so lousy for threading, how does Objective-C do it?

There are multiple locks in play and, effectively, a retain/release on any given object selects a random lock (but always the same lock) for that object. Thus, reducing lock contention while not requiring one lock per object.

(And what Catfish_man said; some classes will implement their own reference counting scheme to use class-specific locking primitives to avoid contention and/or optimize for their specific needs.)

The implementation details are more complex.

Is Objectice-C's reference counting actually technically unsafe with threads?

Nope -- it is safe in regards to threads.

In reality, typical code will call retain and release quite infrequently, compared to other operations. Thus, even if there were significant overhead on those code paths, it would be amortized across all the other operations in the app (where, say, pushing pixels to the screen is really expensive, by comparison).

If an object is shared across threads (bad idea, in general), then the locking overhead protecting the data access and manipulation will generally be vastly greater than the retain/release overhead because of the infrequency of retaining/releasing.


As far as Python's GIL overhead is concerned, I would bet that it has more to do with how often the reference count is incremented and decremented as a part of normal interpreter operations.

like image 112
bbum Avatar answered Sep 18 '22 14:09

bbum