Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does field declaration with duplicated nested type in generic class results in huge source code increase?

Scenario is very rare, but quite simple: you define a generic class, then create a nested class which inherits from outer class and define a associative field (of self type) within nested. Code snippet is simpler, than description:

class Outer<T> {     class Inner : Outer<Inner>     {         Inner field;     } } 

after decompilation of IL, C# code look like this:

internal class Outer<T> {     private class Inner : Outer<Outer<T>.Inner>     {         private Outer<Outer<T>.Inner>.Inner field;     } } 

This seems to be fair enough, but when you change the type declaration of the field, things become trickier. So when I change the field declaration to

Inner.Inner field; 

After decompilation this field will looks like this:

private Outer<Outer<Outer<T>.Inner>.Inner>.Inner field; 

I understand, that class 'nestedness' and inheritance don't quite get along with each other, but why do we observe such behavior? Is the Inner.Inner type declaration has changed the type at all? Are Inner.Inner and Inner types differ in some way in this context?

When things become very tricky

You can see the decompiled source code for the class below. It's really huge and has total length of 12159 symbols.

class X<A, B, C> {     class Y : X<Y, Y, Y>     {         Y.Y.Y.Y.Y.Y y;     } }  

Finally, this class:

class X<A, B, C, D, E> {     class Y : X<Y, Y, Y, Y, Y>     {         Y.Y.Y.Y.Y.Y.Y.Y.Y y;     } } 

results in 27.9 MB (29,302,272 bytes) assembly and Total build time: 00:43.619

Tools used

Compilation is done under C# 5 and C# 4 compilers. Decompilation is done by dotPeek. Build configurations: Release and Debug

like image 452
Ilya Ivanov Avatar asked Jan 05 '13 22:01

Ilya Ivanov


1 Answers

The core of your question is why Inner.Inner is a different type than Inner. Once you understand that, your observations about compile time and generated IL code size follow easily.

The first thing to note is that when you have this declaration

public class X<T> {   public class Y { } } 

There are infinitely many types associated with the name Y. There is one for each generic type argument T, so X<int>.Y is different than X<object>.Y, and, important for later, X<X<T>>.Y is a different type than X<T>.Y for all T's. You can test this for various types T.

The next thing to note is that in

public class A {   public class B : A { } } 

There are infinitely many ways to refer to nested type B. One is A.B, another is A.B.B, and so on. The statement typeof(A.B) == typeof(A.B.B) returns true.

When you combine these two, the way you have done, something interesting happens. The type Outer<T>.Inner is not the same type as Outer<T>.Inner.Inner. Outer<T>.Inner is a subclass of Outer<Outer<T>.Inner> while Outer<T>.Inner.Inner is a subclass of Outer<Outer<Outer<T>.Inner>.Inner>, which we established before as being different from Outer<T>.Inner. So Outer<T>.Inner.Inner and Outer<T>.Inner are referring to different types.

When generating IL, the compiler always uses fully qualified names for types. You have cleverly found a way to refer to types with names whose lengths that grow at exponential rates. That is why as you increase the generic arity of Outer or add additional levels .Y to the field field in Inner the output IL size and compile time grow so quickly.

like image 188
Mike Zboray Avatar answered Sep 22 '22 13:09

Mike Zboray