Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how virtual generic method call is implemented?

Tags:

c#

.net

clr

I'm interesting in how CLR implementes the calls like this:

abstract class A {
    public abstract void Foo<T, U, V>();
}

A a = ...
a.Foo<int, string, decimal>(); // <=== ?

Is this call cause an some kind of hash map lookup by type parameters tokens as the keys and compiled generic method specialization (one for all reference types and the different code for all the value types) as the values?

like image 729
controlflow Avatar asked Jul 04 '11 15:07

controlflow


People also ask

How are generics implemented?

A bytecode language that supports generic types in its metadata system is designed and implemented. At runtime the JIT compiler turns the bytecode into machine code; it is responsible for constructing the appropriate machine code given a generic specialization.

How do you declare a generic method How do you invoke a generic method?

Generic MethodsAll generic method declarations have a type parameter section delimited by angle brackets (< and >) that precedes the method's return type ( < E > in the next example). Each type parameter section contains one or more type parameters separated by commas.

How do you declare a generic method in Java?

For static generic methods, the type parameter section must appear before the method's return type. The complete syntax for invoking this method would be: Pair<Integer, String> p1 = new Pair<>(1, "apple"); Pair<Integer, String> p2 = new Pair<>(2, "pear"); boolean same = Util. <Integer, String>compare(p1, p2);


2 Answers

I didn't find much exact information about this, so much of this answer is based on the excellent paper on .Net generics from 2001 (even before .Net 1.0 came out!), one short note in a follow-up paper and what I gathered from SSCLI v. 2.0 source code (even though I wasn't able to find the exact code for calling virtual generic methods).

Let's start simple: how is a non-generic non-virtual method called? By directly calling the method code, so the compiled code contains direct address. The compiler gets the method address from the method table (see next paragraph). Can it be that simple? Well, almost. The fact that methods are JITed makes it a little more complicated: what is actually called is either code that compiles the method and only then executes it, if it wasn't compiled yet; or it's one instruction that directly calls the compiled code, if it already exists. I'm going to ignore this detail further on.

Now, how is a non-generic virtual method called? Similar to polymorphism in languages like C++, there is a method table accessible from the this pointer (reference). Each derived class has its own method table and its methods there. So, to call a virtual method, get the reference to this (passed in as a parameter), from there, get the reference to the method table, look at the correct entry in it (the entry number is constant for specific function) and call the code the entry points to. Calling methods through interfaces is slightly more complicated, but not interesting for us now.

Now we need to know about code sharing. Code can be shared between two “instances” of the same method, if reference types in type parameters correspond to any other reference types, and value types are exactly the same. So, for example C<string>.M<int>() shares code with C<object>.M<int>(), but not with C<string>.M<byte>(). There is no difference between type type parameters and method type parameters. (The original paper from 2001 mentions that code can be shared also when both parameters are structs with the same layout, but I'm not sure this is true in the actual implementation.)

Let's make an intermediate step on our way to generic methods: non-generic methods in generic types. Because of code sharing, we need to get the type parameters from somewhere (e.g. for calling code like new T[]). For this reason, each instantiation of generic type (e.g. C<string> and C<object>) has its own type handle, which contains the type parameters and also method table. Ordinary methods can access this type handle (technically a structure confusingly called MethodTable, even though it contains more than just the method table) from the this reference. There are two types of methods that can't do that: static methods and methods on value types. For those, the type handle is passed in as a hidden argument.

For non-virtual generic methods, the type handle is not enough and so they get different hidden argument, MethodDesc, that contains the type parameters. Also, the compiler can't store the instantiations in the ordinary method table, because that's static. So it creates a second, different method table for generic methods, which is indexed by type parameters, and gets the method address from there, if it already exists with compatible type parameters, or creates a new entry.

Virtual generic methods are now simple: the compiler doesn't know the concrete type, so it has to use the method table at runtime. And the normal method table can't be used, so it has to look in the special method table for generic methods. Of course, the hidden parameter containing type parameters is still present.

One interesting tidbit learned while researching this: because the JITer is very lazy, the following (completely useless) code works:

object Lift<T>(int count) where T : new()
{
    if (count == 0)
        return new T();

    return Lift<List<T>>(count - 1);
}

The equivalent C++ code causes the compiler to give up with a stack overflow.

like image 145
svick Avatar answered Oct 05 '22 23:10

svick


Yes. The code for specific type is generated at the runtime by CLR and keeps a hashtable (or similar) of implementations.

Page 372 of CLR via C#:

When a method that uses generic type parameters is JIT-compiled, the CLR takes the method's IL, substitutes the specified type arguments, and then creates native code that is specific to that method operating on the specified data types. This is exactly what you want and is one of the main features of generics. However, there is a downside to this: the CLR keeps generating native code for every method/type combination. This is referred to as code explosion. This can end up increasing the application's working set substantially, thereby hurting performance. Fortunately, the CLR has some optimizations built into it to reduce code explosion. First, if a method is called for a particular type argument, and later, the method is called again using the same type argument, the CLR will compile the code for this method/type combination just once. So if one assembly uses List, and a completely different assembly (loaded in the same AppDomain) also uses List, the CLR will compile the methods for List just once. This reduces code explosion substantially.

like image 32
Aliostad Avatar answered Oct 06 '22 01:10

Aliostad