Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does adding beforefieldinit drasticly improve the execution speed of generic classes?

I'm working on a proxy and for generic classes with a reference type parameter it was very slow. Especially for generic methods (about 400 ms vs 3200 ms for trivial generic methods that just returned null). I decided to try to see how it would perform if I rewrote the generated class in C#, and it performed much better, about the same performance as my non-generic class code.

Here is the C# class I wrote:: (note I changed by naming scheme but not a heck of a lot)::

namespace TestData
{
    public class TestClassProxy<pR> : TestClass<pR>
    {
        private InvocationHandler<Func<TestClass<pR>, object>> _0_Test;
        private InvocationHandler<Func<TestClass<pR>, pR, GenericToken, object>> _1_Test;
        private static readonly InvocationHandler[] _proxy_handlers = new InvocationHandler[] { 
            new InvocationHandler<Func<TestClass<pR>, object>>(new Func<TestClass<pR>, object>(TestClassProxy<pR>.s_0_Test)), 
        new GenericInvocationHandler<Func<TestClass<pR>, pR, GenericToken, object>>(typeof(TestClassProxy<pR>), "s_1_Test") };



        public TestClassProxy(InvocationHandler[] handlers)
        {
            if (handlers == null)
            {
                throw new ArgumentNullException("handlers");
            }
            if (handlers.Length != 2)
            {
                throw new ArgumentException("Handlers needs to be an array of 2 parameters.", "handlers");
            }
            this._0_Test = (InvocationHandler<Func<TestClass<pR>, object>>)(handlers[0] ?? _proxy_handlers[0]);
            this._1_Test = (InvocationHandler<Func<TestClass<pR>, pR, GenericToken, object>>)(handlers[1] ?? _proxy_handlers[1]);
        }


        private object __0__Test()
        {
            return base.Test();
        }

        private object __1__Test<T>(pR local1) where T:IConvertible
        {
            return base.Test<T>(local1);
        }

        public static object s_0_Test(TestClass<pR> class1)
        {
            return ((TestClassProxy<pR>)class1).__0__Test();
        }

        public static object s_1_Test<T>(TestClass<pR> class1, pR local1) where T:IConvertible
        {
            return ((TestClassProxy<pR>)class1).__1__Test<T>(local1);
        }

        public override object Test()
        {
            return this._0_Test.Target(this);
        }

        public override object Test<T>(pR local1)
        {
             return this._1_Test.Target(this, local1, GenericToken<T>.Token);
        }
    }
}

This is compiles in release mode to the same IL as my generated proxy here is the class that its proxying::

namespace TestData
{
    public class TestClass<R>
    {
        public virtual object Test()
        {
            return default(object);
        }

        public virtual object Test<T>(R r) where T:IConvertible
        {
            return default(object);
        }
    }
}

There was one-exception, I was not setting the beforefieldinit attribute on the type generated. I was just setting the following attributes::public auto ansi

Why did using beforefieldinit make the performance improve so much?

(The only other difference was I wasn't naming my parameters which really didn't matter in the grand scheme of things. The names for methods and fields are scrambled to avoid collision with real methods. GenericToken and InvocationHandlers are implementation details that are irrelevant for sake of argument.
GenericToken is literally used as just a typed data holder as it allows me to send "T" to the handler

InvocationHandler is just a holder for the delegate field target there is no actual implementation detail.

GenericInvocationHandler uses a callsite technique like the DLR to rewrite the delegate as needed to handle the different generic arguments passed )

EDIT:: Here is the test harness::

private static void RunTests(int count = 1 << 24, bool displayResults = true)
{
    var tests = Array.FindAll(Tests, t => t != null);
    var maxLength = tests.Select(x => GetMethodName(x.Method).Length).Max();

    for (int j = 0; j < tests.Length; j++)
    {
        var action = tests[j];
        Stopwatch sw = Stopwatch.StartNew();
        for (int i = 0; i < count; i++)
        {
            action();
        }
        sw.Stop();
        if (displayResults)
        {
            Console.WriteLine("{2}  {0}: {1}ms", GetMethodName(action.Method).PadRight(maxLength),
                              ((int)sw.ElapsedMilliseconds).ToString(), j);
        }
        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();
    }
}

private static string GetMethodName(MethodInfo method)
{
    return method.IsGenericMethod
            ? string.Format(@"{0}<{1}>", method.Name, string.Join<Type>(",", method.GetGenericArguments()))
            : method.Name;
}

And in a test I do the following::

Tests[0] = () => proxiedTestClass.Test();
Tests[1] = () => proxiedTestClass.Test<string>("2");
Tests[2] = () => handClass.Test();
Tests[3] = () => handClass.Test<string>("2");
RunTests(100, false);
RunTests();

Where Tests is a Func<object>[20], and proxiedTestClass is the class generated by my assembly, and handClass is the one I generated by hand. RunTests is called twice, once to "warm" things up, and again to run it and print to the screen. I mostly took this code from a post here by Jon Skeet.

like image 845
Michael B Avatar asked Jan 28 '13 20:01

Michael B


2 Answers

As stated in ECMA-335 (CLI cpecification), part I, section 8.9.5:

The semantics of when and what triggers execution of such type initialization methods, is as follows:

  1. A type can have a type-initializer method, or not.
  2. A type can be specified as having a relaxed semantic for its type-initializer method (for convenience below, we call this relaxed semantic BeforeFieldInit).
  3. If marked BeforeFieldInit then the type’s initializer method is executed at, or sometime before, first access to any static field defined for that type.
  4. If not marked BeforeFieldInit then that type’s initializer method is executed at (i.e., is triggered by):

    a. first access to any static field of that type, or

    b. first invocation of any static method of that type, or

    c. first invocation of any instance or virtual method of that type if it is a value type or

    d. first invocation of any constructor for that type.

Also, as you can see from the Michael's code above, the TestClassProxy has only one static field: _proxy_handlers. Notice, that it is used only two times:

  1. In the instance constructor
  2. And in the static field initializer itself

So when BeforeFieldInit is specified, type-initializer will be called only once: in the instance constructor, right before the first access to _proxy_handlers.

But if BeforeFieldInit is omitted, CLR will place the call to the type-initializer before every TestClassProxy's static method invocation, static field access, etc.

In particular, the type-initializer will be called on every invocation of s_0_Test and s_1_Test<T> static methods.

Of course, as stated in ECMA-334 (C# Language Specification), section 17.11:

The static constructor for a non-generic class executes at most once in a given application domain. The static constructor for a generic class declaration executes at most once for each closed constructed type constructed from the class declaration (§25.1.5).

But in order to guarantee this, CLR have to check (in a thread-safe manner) if the class is already initialized, or not.

And these checks will decrease the performance.

PS: You might be surprised that performance issues will gone once you change s_0_Test and s_1_Test<T> to be instance-methods.

like image 51
Nikolay Khil Avatar answered Nov 12 '22 16:11

Nikolay Khil


First, if you want to learn more about beforefieldinit, read Jon Skeet's article C# and beforefieldinit. Parts of this answer are based on that and I'll repeat the relevant bits here.

Second, your code does very little, so overhead will have significant impact on your measurements. In real code, the impact is likely to be much smaller.

Third, you don't need to use Reflection.Emit to set whether a class has beforefieldint. You can disable that flag in C# by adding a static constructor (e.g. static TestClassProxy() {}).

Now, what beforefieldinit does is that it governs when is the type initializer (method called .cctor) called. In C# terms, type initializer contains all static field initializers and code from the static constructor, if there is one.

If you don't set that flag, the type initializer will be called when either an instance of the class is created or any of the static members of the class are referenced. (Taken from the C# spec, using CLI spec here would be more accurate, but the end result is the same.*)

What this means is that without beforefieldinit, the compiler is very confined about when to call the type initializer, it can't decide to call it a bit earlier, even if doing that would be more convenient (and resulted in faster code).

Knowing this, we can look at what's actually happening in your code. The problematic cases are static methods, because that's where the type initializer might be called. (Instance constructor is another one, but you're not measuring that.)

I focused on the method s_1_Test(). And because I don't actually need it to do anything, I simplified it (to make the generated native code shorter) to:

public static object s_1_Test<T>(TestClass<pR> class1, pR local1) where T:IConvertible
{
    return null;
}

Now, let's look at the disassembly in VS (in Release mode), first without the static constructor, that is with beforefieldinit:

00000000  xor         eax,eax
00000002  ret

Here, the result is set to 0 (it's done in somewhat obfuscated manner for performance reasons) and the method returns, very simple.

What happens with static the static constructor (i.e. without beforefieldinit)?

00000000  sub         rsp,28h
00000004  mov         rdx,rcx
00000007  xor         ecx,ecx
00000009  call        000000005F8213A0
0000000e  xor         eax,eax
00000010  add         rsp,28h
00000014  ret

This is much more complicated, the real problem is the call instruction, which presumably calls a function that invokes the type initializer if necessary.

I believe this the source of the performance difference between the two situations.

The reason why the added check is necessary is because your type is generic and you're using it with a reference type as a type parameter. In that case, the JITted code for different generic versions of your class is shared, but the type initializer has to be called for each generic version. Moving the static methods to another, non-generic type would be one way to solve the issue.


* Unless you do something crazy like calling instance method on null using call (and not callvirt, which throws for null).

like image 43
svick Avatar answered Nov 12 '22 15:11

svick