Explaining virtual dispatching to someone is easy: every object has a pointer to a table as part of its data. There are N virtual methods on the class. Every call to a particular method i indexes the object when it arrives and calls the ith method in the table. Every class that implements method X() will have the code for method X() in the same ith index.
But then we get interfaces. And interfaces require some sort of contortion because two non-inheriting classes that both implement the same interface will have the virtual functions in different indexes of the table.
I have searched the Internet, and there are many discussions I can find about how interface dispatching can be implemented. There are two broad categories: a) some sort of hash table look up on the object to find the right dispatch table b) when the object is cast to the interface, a new pointer is created that points to the same data but to a different vtable.
But despite lots of info about how it can work, I can find nothing about how the .NET runtime engine actually implements it.
Does anyone know of a document that describes the actual pointer arithmetic that happens at a callvirt instruction when the object type is an interface?
Interface dispatching in the CLR is black magic.
As you correctly note, virtual method dispatch is conceptually easy to explain. And in fact I do so in this series of articles, where I describe how you could implement virtual methods in a C#-like language that lacked them:
http://blogs.msdn.com/b/ericlippert/archive/2011/03/17/implementing-the-virtual-method-pattern-in-c-part-one.aspx
The mechanisms I describe are quite similar to the mechanisms actually used.
Interface dispatch is much harder to describe, and the way the CLR implements it is not at all apparent. The CLR mechanisms for interface dispatch have been carefully tuned to provide high performance for the most common situations, and the details of those mechanisms are therefore subject to change based as the CLR team develops more knowledge about real-world patterns of usage.
Basically the way it works behind the scenes is that each call site -- that is, each point in the code where an interface method is invoked -- there is a little cache that says "I think the method associated with this interface slot is... here". The vast majority of the time, that cache is right; you very seldom call the same interface method a million times with a million different implementations. It's usually the same implementation over and over again, many times in a row.
If the cache turns out to be a miss then it falls back to a hash table that is maintained, to do a slightly slower lookup.
If that turns out to be a miss, then the object metadata is analyzed to determine what method corresponds to the interface slot.
The net effect is that at a given call site, if you always invoke an interface method that maps to a particular class method, it is very fast. If you always invoke one of a handful of class methods for a given interface method, performance is pretty good. The worst thing to do is to never invoke the same class method twice with the same interface method at the same site; that takes the slowest path every time.
If you want to know how the tables for the slow lookup are maintained in memory, see the link in Matthew Watson's answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With