I've written two equivalent methods: <pre class="prettyprint"><code>static bool F<T>(T a, T b) where T : class { return a == b; } static bool F2(A a, A b) { return a == b; } </code></pre> Time difference: 00:00:00.0380022 00:00:00.0170009 Code for testing: <pre class="prettyprint"><code>var a = new A(); for (int i = 0; i < 100000000; i++) F<A>(a, a); Console.WriteLine(DateTime.Now - dt); dt = DateTime.Now; for (int i = 0; i < 100000000; i++) F2(a, a); Console.WriteLine(DateTime.Now - dt); </code></pre> Does anyone know why? In a comment below, dtb* show CIL: <pre class="prettyprint"><code>IL for F2: ldarg.0, ldarg.1, ceq, ret. IL for F<T>: ldarg.0, box !!T, ldarg.1, box !!T, ceq, ret. </code></pre> I think it's the answer for my question, but what magic can I use to deny boxing? Next I use code from Psilon: <pre class="prettyprint"><code>using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; namespace ConsoleApplication58 { internal class Program { private class A { } private static bool F<T>(T a, T b) where T : class { return a == b; } private static bool F2(A a, A b) { return a == b; } private static void Main() { const int rounds = 100, n = 10000000; var a = new A(); var fList = new List<TimeSpan>(); var f2List = new List<TimeSpan>(); for (int i = 0; i < rounds; i++) { // Test generic GCClear(); bool res; var sw = new Stopwatch(); sw.Start(); for (int j = 0; j < n; j++) { res = F(a, a); } sw.Stop(); fList.Add(sw.Elapsed); // Test not-generic GCClear(); bool res2; var sw2 = new Stopwatch(); sw2.Start(); for (int j = 0; j < n; j++) { res2 = F2(a, a); } sw2.Stop(); f2List.Add(sw2.Elapsed); } double f1AverageTicks = fList.Average(ts => ts.Ticks); Console.WriteLine("Elapsed for F = {0} \t ticks = {1}", fList.Average(ts => ts.TotalMilliseconds), f1AverageTicks); double f2AverageTicks = f2List.Average(ts => ts.Ticks); Console.WriteLine("Elapsed for F2 = {0} \t ticks = {1}", f2List.Average(ts => ts.TotalMilliseconds), f2AverageTicks); Console.WriteLine("Not-generic method is {0} times faster, or on {1}%", f1AverageTicks/f2AverageTicks, (f1AverageTicks/f2AverageTicks - 1)*100); Console.ReadKey(); } private static void GCClear() { GC.Collect(); GC.WaitForPendingFinalizers(); GC.Collect(); } } } </code></pre> Windows 7, .NET 4.5, Visual Studio 2012, release, optimized, without attaching. x64 <pre class="prettyprint"><code>Elapsed for F = 23.68157 ticks = 236815.7 Elapsed for F2 = 1.701638 ticks = 17016.38 Not-generic method is 13.916925926666 times faster, or on 1291.6925926666% </code></pre> x86 <pre class="prettyprint"><code>Elapsed for F = 6.713223 ticks = 67132.23 Elapsed for F2 = 6.729897 ticks = 67298.97 Not-generic method is 0.997522398931217 times faster, or on -0.247760106878314% </code></pre> And I've got new magic: x64 is three times faster... PS: My target platform is x64.

Your testing method is flawed. There are a few big problems with how you did it. First, you did not provide a "warm-up". In .NET the first time you access something it will be slower than subsequent calls so it can load up any needed assemblies. If you are going to perform tests like this you must do each function at least once or the first test to run will suffer a large penalty. Go ahead and swap the order, you will likely see the opposite results. Second <code>DateTime</code> is only accurate to 16ms, so when comparing two times you have a +/- error of 32 ms. The difference between the two results are 21 ms, well within the experimental error. You must use a more accurate timer like the Stopwatch class. Lastly, don't do artificial tests like this. They don't show you any useful information other than bragging rights for one class or another. Instead learn to use a Code Profiler. This will show you what is slowing down your code and you can make informed decisions on how to solve the problem instead of "guessing" that not using a templated class will make your code faster. Here is a example code that shows how it "should" be done: <pre class="prettyprint"><code>using System; using System.Diagnostics; namespace Sandbox_Console { class A { } internal static class Program { static bool F<T>(T a, T b) where T : class { return a == b; } static bool F2(A a, A b) { return a == b; } private static void Main() { var a = new A(); Stopwatch st = new Stopwatch(); Console.WriteLine("warmup"); st.Start(); for (int i = 0; i < 100000000; i++) F<A>(a, a); Console.WriteLine(st.Elapsed); st.Restart(); for (int i = 0; i < 100000000; i++) F2(a, a); Console.WriteLine(st.Elapsed); Console.WriteLine("real"); st.Restart(); for (int i = 0; i < 100000000; i++) F<A>(a, a); Console.WriteLine(st.Elapsed); st.Restart(); for (int i = 0; i < 100000000; i++) F2(a, a); Console.WriteLine(st.Elapsed); Console.WriteLine("Done"); Console.ReadLine(); } } } </code></pre> And here are the results: <pre class="prettyprint"><code>warmup 00:00:00.0297904 00:00:00.0298949 real 00:00:00.0296838 00:00:00.0297823 Done </code></pre> Swapping the order of the last two the first one is always shorter, so effectively they are the "same time" as it is within the experimental error.

Generic vs not-generic performance in C#

Tags:

performance

c#

templates

generics

I've written two equivalent methods:

static bool F<T>(T a, T b) where T : class {     return a == b; }  static bool F2(A a, A b) {     return a == b; }

Time difference:
00:00:00.0380022
00:00:00.0170009

Code for testing:

var a = new A(); for (int i = 0; i < 100000000; i++)     F<A>(a, a); Console.WriteLine(DateTime.Now - dt);  dt = DateTime.Now; for (int i = 0; i < 100000000; i++)     F2(a, a); Console.WriteLine(DateTime.Now - dt);

Does anyone know why?

In a comment below, dtb* show CIL:

IL for F2: ldarg.0, ldarg.1, ceq, ret. IL for F<T>: ldarg.0, box !!T, ldarg.1, box !!T, ceq, ret.

I think it's the answer for my question, but what magic can I use to deny boxing?

Next I use code from Psilon:

using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq;  namespace ConsoleApplication58 {     internal class Program     {         private class A         {          }          private static bool F<T>(T a, T b) where T : class         {             return a == b;         }          private static bool F2(A a, A b)         {             return a == b;         }          private static void Main()         {             const int rounds = 100, n = 10000000;             var a = new A();             var fList = new List<TimeSpan>();             var f2List = new List<TimeSpan>();             for (int i = 0; i < rounds; i++)             {                 // Test generic                 GCClear();                 bool res;                 var sw = new Stopwatch();                 sw.Start();                 for (int j = 0; j < n; j++)                 {                     res = F(a, a);                 }                 sw.Stop();                 fList.Add(sw.Elapsed);                  // Test not-generic                 GCClear();                 bool res2;                 var sw2 = new Stopwatch();                 sw2.Start();                 for (int j = 0; j < n; j++)                 {                     res2 = F2(a, a);                 }                 sw2.Stop();                 f2List.Add(sw2.Elapsed);             }             double f1AverageTicks = fList.Average(ts => ts.Ticks);             Console.WriteLine("Elapsed for F = {0} \t ticks = {1}", fList.Average(ts => ts.TotalMilliseconds),                               f1AverageTicks);             double f2AverageTicks = f2List.Average(ts => ts.Ticks);             Console.WriteLine("Elapsed for F2 = {0} \t ticks = {1}", f2List.Average(ts => ts.TotalMilliseconds),                   f2AverageTicks);             Console.WriteLine("Not-generic method is {0} times faster, or on {1}%", f1AverageTicks/f2AverageTicks,                               (f1AverageTicks/f2AverageTicks - 1)*100);             Console.ReadKey();         }          private static void GCClear()         {             GC.Collect();             GC.WaitForPendingFinalizers();             GC.Collect();         }     } }

Windows 7, .NET 4.5, Visual Studio 2012, release, optimized, without attaching.

x64

Elapsed for F = 23.68157         ticks = 236815.7 Elapsed for F2 = 1.701638        ticks = 17016.38 Not-generic method is 13.916925926666 times faster, or on 1291.6925926666%

x86

Elapsed for F = 6.713223         ticks = 67132.23 Elapsed for F2 = 6.729897        ticks = 67298.97 Not-generic method is 0.997522398931217 times faster, or on -0.247760106878314%

And I've got new magic: x64 is three times faster...

PS: My target platform is x64.

457

asked Jun 25 '13 20:06

Dmitry

2 Answers

I did make some changes to your code to measure perf correctly.

Use Stopwatch
Execute Release Mode
Prevent Inlining.
Use GetHashCode() to do some real work
Look at the generated Assembly code

Here is the code:

class A { }  [MethodImpl(MethodImplOptions.NoInlining)] static bool F<T>(T a, T b) where T : class {     return a.GetHashCode() == b.GetHashCode(); }  [MethodImpl(MethodImplOptions.NoInlining)] static bool F2(A a, A b) {     return a.GetHashCode() == b.GetHashCode(); }  static int Main(string[] args) {     const int Runs = 100 * 1000 * 1000;     var a = new A();     bool lret = F<A>(a, a);     var sw = Stopwatch.StartNew();     for (int i = 0; i < Runs; i++)     {         F<A>(a, a);     }     sw.Stop();     Console.WriteLine("Generic: {0:F2}s", sw.Elapsed.TotalSeconds);      lret = F2(a, a);     sw = Stopwatch.StartNew();     for (int i = 0; i < Runs; i++)     {         F2(a, a);     }     sw.Stop();     Console.WriteLine("Non Generic: {0:F2}s", sw.Elapsed.TotalSeconds);      return lret ? 1 : 0; }

During my tests the non generic version was slightly faster (.NET 4.5 x32 Windows 7). But there is practically no measurable difference in speed. I would say the are both equal. For completeness here is the assembly code of the generic version: I got the assembly code via the debugger in Release mode with JIT optimizations enabled.The default is to disable JIT optimizations during debugging to make setting breakpoints and variables inspection easier.

Generic

static bool F<T>(T a, T b) where T : class {         return a.GetHashCode() == b.GetHashCode(); }  push        ebp  mov         ebp,esp  push        ebx  sub         esp,8 // reserve stack for two locals  mov         dword ptr [ebp-8],ecx // store first arg on stack mov         dword ptr [ebp-0Ch],edx // store second arg on stack mov         ecx,dword ptr [ebp-8] // get first arg from stack --> stupid! mov         eax,dword ptr [ecx]   // load MT pointer from a instance mov         eax,dword ptr [eax+28h] // Locate method table start call        dword ptr [eax+8] //GetHashCode // call GetHashCode function pointer which is the second method starting from the method table mov         ebx,eax           // store result in ebx mov         ecx,dword ptr [ebp-0Ch] // get second arg mov         eax,dword ptr [ecx]     // call method as usual ... mov         eax,dword ptr [eax+28h]  call        dword ptr [eax+8] //GetHashCode cmp         ebx,eax  sete        al  movzx       eax,al  lea         esp,[ebp-4]  pop         ebx  pop         ebp  ret         4

Non Generic

static bool F2(A a, A b) {   return a.GetHashCode() == b.GetHashCode(); }  push        ebp  mov         ebp,esp  push        esi  push        ebx  mov         esi,edx  mov         eax,dword ptr [ecx]  mov         eax,dword ptr [eax+28h]  call        dword ptr [eax+8] //GetHashCode mov         ebx,eax  mov         ecx,esi  mov         eax,dword ptr [ecx]  mov         eax,dword ptr [eax+28h]  call        dword ptr [eax+8] //GetHashCode cmp         ebx,eax  sete        al  movzx       eax,al  pop         ebx  pop         esi  pop         ebp  ret

As you can see the generic version looks slightly more inefficient due to more stack memoy operations which are not perfect but in reality the difference is not measurable since all is fitting into the L1 cache of the processor which makes the memory operations less costly compared to the pure register operations of the non generic version. I would suspect that the non generic version should perform a little better in real world if you need to pay for real memory access not coming from any CPU cache.

For all practical purposes these both methods are identical. You should look at some other place for real world performance gains. I would first look at the data access patterns and used data structures. Algorithmic changes tend to bring much more perf gain than such low level stuff.

Edit1: If you want to use == then you will find

00000000  push        ebp  00000001  mov         ebp,esp  00000003  cmp         ecx,edx // Check for reference equality  00000005  sete        al  00000008  movzx       eax,al  0000000b  pop         ebp  0000000c  ret         4

both methods produce exactly the same machine code. Any difference you did measure are your measurement errors.

140

answered Oct 08 '22 20:10

Alois Kraus

Your testing method is flawed. There are a few big problems with how you did it.

First, you did not provide a "warm-up". In .NET the first time you access something it will be slower than subsequent calls so it can load up any needed assemblies. If you are going to perform tests like this you must do each function at least once or the first test to run will suffer a large penalty. Go ahead and swap the order, you will likely see the opposite results.

Second DateTime is only accurate to 16ms, so when comparing two times you have a +/- error of 32 ms. The difference between the two results are 21 ms, well within the experimental error. You must use a more accurate timer like the Stopwatch class.

Lastly, don't do artificial tests like this. They don't show you any useful information other than bragging rights for one class or another. Instead learn to use a Code Profiler. This will show you what is slowing down your code and you can make informed decisions on how to solve the problem instead of "guessing" that not using a templated class will make your code faster.

Here is a example code that shows how it "should" be done:

using System; using System.Diagnostics;  namespace Sandbox_Console {     class A     {     }      internal static class Program     {         static bool F<T>(T a, T b) where T : class         {             return a == b;         }          static bool F2(A a, A b)         {             return a == b;         }          private static void Main()         {             var a = new A();             Stopwatch st = new Stopwatch();              Console.WriteLine("warmup");             st.Start();             for (int i = 0; i < 100000000; i++)                 F<A>(a, a);             Console.WriteLine(st.Elapsed);              st.Restart();             for (int i = 0; i < 100000000; i++)                 F2(a, a);             Console.WriteLine(st.Elapsed);              Console.WriteLine("real");             st.Restart();             for (int i = 0; i < 100000000; i++)                 F<A>(a, a);             Console.WriteLine(st.Elapsed);              st.Restart();             for (int i = 0; i < 100000000; i++)                 F2(a, a);             Console.WriteLine(st.Elapsed);              Console.WriteLine("Done");             Console.ReadLine();         }      } }

And here are the results:

warmup 00:00:00.0297904 00:00:00.0298949 real 00:00:00.0296838 00:00:00.0297823 Done

Swapping the order of the last two the first one is always shorter, so effectively they are the "same time" as it is within the experimental error.

answered Oct 08 '22 19:10

Scott Chamberlain

Related questions
                            
                                How Thread-Safe is NLog?
                            
                                How can i set a default value for a dependency property of type derived from dependencyobject
                            
                                "Root element is missing" error but I have a root element
                            
                                Setup method in Moq, ambiguous call
                            
                                Passing object messages in Azure Queue Storage
                            
                                How to correctly represent a whitespace character
                            
                                Self hosted OWIN and urlacl
                            
                                Web API / OWIN, SignalR & Authorization
                            
                                Setting application icon in Explorer?
                            
                                What is the equivalent of [Serializable] in .NET Core ? (Conversion Projects)
                            
                                Injecting DbContext into service layer
                            
                                C# Is there an Exception overview?
                            
                                Why use Decimal.Multiply vs operator multiply?
                            
                                Cannot step into a method returning IEnumerable<T>?
                            
                                C# Namespaces and Assemblies Best Practice
                            
                                The context cannot be used while the model is being created
                            
                                Understanding code first virtual properties
                            
                                Programming against multiple interfaces
                            
                                How is local determined in ToLocalTime()
                            
                                algorithm - Is the RijndaelManaged Class in C# equivalent to AES encryption?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With