What optimization hints can I give to the compiler/JIT?

2 Answers

Yes there are more tricks :-)

I've actually did quite a bit of research on optimizing C# code. So far, these are the most significant results:

Func's and Action's that are passed directly are often inlined by the JIT'ter. Note that you shouldn't store them as variable, because they are then called as delegates. See also this post for more details.
Be careful with overloads. Calling Equals without using IEquatable<T> is usually a bad plan - so if you use f.ex. a hash, be sure to implement the right overloads and interfaces, because it'll safe you a ton of performance.
Generics called from other classes are never inlined. The reason for this is the "magic" outlined here.
If you use a data structure, make sure to try using an array instead :-) Really, these things are fast as hell compared to ... well, just about anything I suppose. I've optimized quite a bit of things by using my own hash tables and using arrays instead of list's.
In a lot of cases, table lookups are faster than computing things or using constructions like vtable lookups, switches, multiple if statements and even calculations. This is also a good trick if you have branches; failed branch prediction can often become a big pain. See also this post - this is a trick I use quite a lot in C# and it works great in a lot of cases. Oh, and lookup tables are arrays of course.
Experiment with making (small) classes structs. Because of the nature of value types, some optimizations are different for struct's than for class'es. For example, method calls are simpler, because the compiler knows exactly what method is going to get called. Also arrays of structs are usually faster than arrays of classes, because they require 1 memory operation less per array operation.
Don't use multi-dimensional arrays. While I prefer Foo[], even Foo[][] is normally faster than Foo[,].
If you're copying data, prefer Buffer.BlockCopy over Array.Copy any day of the week. Also be cautious around strings: string operations can be a performance drainer.

There also used to be a guide called "optimization for the intel pentium processor" with a large number of tricks (like shifting or multiplying instead of dividing). While the compiler does a fine effort nowadays, this also sometimes helps a bit.

Of course these are just optimizations; the biggest performance gains are usually the result of changing the algorithm and/or data structure. Be sure to check out which options are available to you and don't restrict yourself too much by the .NET framework... also I have a natural tendency to distrust the .NET implementation until I've checked the decompiled code by myself... there's a ton of stuff that could have been implemented much faster (most of the times for good reasons).

HTH

Alex pointed out to me that Array.Copy is actually faster according to some people. And since I really don't know what has changed over the years, I decided that the only proper course of action is to create a fresh new benchmark and put it to the test.

If you're just interested in the results, go down. In most cases the call to Buffer.BlockCopy clearly outperforms Array.Copy. Tested on an Intel Skylake with 16 GB memory (>10 GB free) on .NET 4.5.2.

Code:

static void TestNonOverlapped1(int K) {     long total = 1000000000;     long iter = total / K;     byte[] tmp = new byte[K];     byte[] tmp2 = new byte[K];     for (long i = 0; i < iter; ++i)     {         Array.Copy(tmp, tmp2, K);     } }  static void TestNonOverlapped2(int K) {     long total = 1000000000;     long iter = total / K;     byte[] tmp = new byte[K];     byte[] tmp2 = new byte[K];     for (long i = 0; i < iter; ++i)     {         Buffer.BlockCopy(tmp, 0, tmp2, 0, K);     } }  static void TestOverlapped1(int K) {     long total = 1000000000;     long iter = total / K;     byte[] tmp = new byte[K + 16];     for (long i = 0; i < iter; ++i)     {         Array.Copy(tmp, 0, tmp, 16, K);     } }  static void TestOverlapped2(int K) {     long total = 1000000000;     long iter = total / K;     byte[] tmp = new byte[K + 16];     for (long i = 0; i < iter; ++i)     {         Buffer.BlockCopy(tmp, 0, tmp, 16, K);     } }  static void Main(string[] args) {     for (int i = 0; i < 10; ++i)     {         int N = 16 << i;          Console.WriteLine("Block size: {0} bytes", N);          Stopwatch sw = Stopwatch.StartNew();          {             sw.Restart();             TestNonOverlapped1(N);              Console.WriteLine("Non-overlapped Array.Copy: {0:0.00} ms", sw.Elapsed.TotalMilliseconds);             GC.Collect(GC.MaxGeneration);             GC.WaitForFullGCComplete();         }          {             sw.Restart();             TestNonOverlapped2(N);              Console.WriteLine("Non-overlapped Buffer.BlockCopy: {0:0.00} ms", sw.Elapsed.TotalMilliseconds);             GC.Collect(GC.MaxGeneration);             GC.WaitForFullGCComplete();         }          {             sw.Restart();             TestOverlapped1(N);              Console.WriteLine("Overlapped Array.Copy: {0:0.00} ms", sw.Elapsed.TotalMilliseconds);             GC.Collect(GC.MaxGeneration);             GC.WaitForFullGCComplete();         }          {             sw.Restart();             TestOverlapped2(N);              Console.WriteLine("Overlapped Buffer.BlockCopy: {0:0.00} ms", sw.Elapsed.TotalMilliseconds);             GC.Collect(GC.MaxGeneration);             GC.WaitForFullGCComplete();         }          Console.WriteLine("-------------------------");     }      Console.ReadLine(); }

Results on x86 JIT:

Block size: 16 bytes Non-overlapped Array.Copy: 4267.52 ms Non-overlapped Buffer.BlockCopy: 2887.05 ms Overlapped Array.Copy: 3305.01 ms Overlapped Buffer.BlockCopy: 2670.18 ms ------------------------- Block size: 32 bytes Non-overlapped Array.Copy: 1327.55 ms Non-overlapped Buffer.BlockCopy: 763.89 ms Overlapped Array.Copy: 2334.91 ms Overlapped Buffer.BlockCopy: 2158.49 ms ------------------------- Block size: 64 bytes Non-overlapped Array.Copy: 705.76 ms Non-overlapped Buffer.BlockCopy: 390.63 ms Overlapped Array.Copy: 1303.00 ms Overlapped Buffer.BlockCopy: 1103.89 ms ------------------------- Block size: 128 bytes Non-overlapped Array.Copy: 361.18 ms Non-overlapped Buffer.BlockCopy: 219.77 ms Overlapped Array.Copy: 620.21 ms Overlapped Buffer.BlockCopy: 577.20 ms ------------------------- Block size: 256 bytes Non-overlapped Array.Copy: 192.92 ms Non-overlapped Buffer.BlockCopy: 108.71 ms Overlapped Array.Copy: 347.63 ms Overlapped Buffer.BlockCopy: 353.40 ms ------------------------- Block size: 512 bytes Non-overlapped Array.Copy: 104.69 ms Non-overlapped Buffer.BlockCopy: 65.65 ms Overlapped Array.Copy: 211.77 ms Overlapped Buffer.BlockCopy: 202.94 ms ------------------------- Block size: 1024 bytes Non-overlapped Array.Copy: 52.93 ms Non-overlapped Buffer.BlockCopy: 38.84 ms Overlapped Array.Copy: 144.39 ms Overlapped Buffer.BlockCopy: 154.09 ms ------------------------- Block size: 2048 bytes Non-overlapped Array.Copy: 45.64 ms Non-overlapped Buffer.BlockCopy: 30.11 ms Overlapped Array.Copy: 118.33 ms Overlapped Buffer.BlockCopy: 109.16 ms ------------------------- Block size: 4096 bytes Non-overlapped Array.Copy: 30.93 ms Non-overlapped Buffer.BlockCopy: 30.72 ms Overlapped Array.Copy: 119.73 ms Overlapped Buffer.BlockCopy: 104.66 ms ------------------------- Block size: 8192 bytes Non-overlapped Array.Copy: 30.37 ms Non-overlapped Buffer.BlockCopy: 26.63 ms Overlapped Array.Copy: 90.46 ms Overlapped Buffer.BlockCopy: 87.40 ms -------------------------

Results on x64 JIT:

Block size: 16 bytes Non-overlapped Array.Copy: 1252.71 ms Non-overlapped Buffer.BlockCopy: 694.34 ms Overlapped Array.Copy: 701.27 ms Overlapped Buffer.BlockCopy: 573.34 ms ------------------------- Block size: 32 bytes Non-overlapped Array.Copy: 995.47 ms Non-overlapped Buffer.BlockCopy: 654.70 ms Overlapped Array.Copy: 398.48 ms Overlapped Buffer.BlockCopy: 336.86 ms ------------------------- Block size: 64 bytes Non-overlapped Array.Copy: 498.86 ms Non-overlapped Buffer.BlockCopy: 329.15 ms Overlapped Array.Copy: 218.43 ms Overlapped Buffer.BlockCopy: 179.95 ms ------------------------- Block size: 128 bytes Non-overlapped Array.Copy: 263.00 ms Non-overlapped Buffer.BlockCopy: 196.71 ms Overlapped Array.Copy: 137.21 ms Overlapped Buffer.BlockCopy: 107.02 ms ------------------------- Block size: 256 bytes Non-overlapped Array.Copy: 144.31 ms Non-overlapped Buffer.BlockCopy: 101.23 ms Overlapped Array.Copy: 85.49 ms Overlapped Buffer.BlockCopy: 69.30 ms ------------------------- Block size: 512 bytes Non-overlapped Array.Copy: 76.76 ms Non-overlapped Buffer.BlockCopy: 55.31 ms Overlapped Array.Copy: 61.99 ms Overlapped Buffer.BlockCopy: 54.06 ms ------------------------- Block size: 1024 bytes Non-overlapped Array.Copy: 44.01 ms Non-overlapped Buffer.BlockCopy: 33.30 ms Overlapped Array.Copy: 53.13 ms Overlapped Buffer.BlockCopy: 51.36 ms ------------------------- Block size: 2048 bytes Non-overlapped Array.Copy: 27.05 ms Non-overlapped Buffer.BlockCopy: 25.57 ms Overlapped Array.Copy: 46.86 ms Overlapped Buffer.BlockCopy: 47.83 ms ------------------------- Block size: 4096 bytes Non-overlapped Array.Copy: 29.11 ms Non-overlapped Buffer.BlockCopy: 25.12 ms Overlapped Array.Copy: 45.05 ms Overlapped Buffer.BlockCopy: 47.84 ms ------------------------- Block size: 8192 bytes Non-overlapped Array.Copy: 24.95 ms Non-overlapped Buffer.BlockCopy: 21.52 ms Overlapped Array.Copy: 43.81 ms Overlapped Buffer.BlockCopy: 43.22 ms -------------------------

147

answered Oct 17 '22 08:10

atlaste

You've exhausted the options added in .NET 4.5 to affect the jitted code directly. Next step is to look at the generated machine code to spot any obvious inefficiencies. Do so with the debugger, first prevent it from disabling the optimizer. Tools + Options, Debugging, General, untick the "Suppress JIT optimization on module load" option. Set a breakpoint on the hot code, Debug + Disassembly to look at it.

There are not that many to consider, the jitter optimizer in general does an excellent job. One thing to look for is failed attempts at eliminating an array bounds check, the fixed keyword is an unsafe workaround for that. A corner case is a failed attempt at inlining a method and the jitter not using cpu registers effectively, an issue with the x86 jitter and fixed with MethodImplOptions.NoInlining. The optimizer is not terribly efficient at hoisting invariant code out of a loop, but that's something you'd almost always first consider when staring at the C# code when looking for ways to optimize it.

The most important thing to want to know is when you are done and just can't hope to make it any faster. You can only really get there by comparing apples and oranges and writing the hot code in native code using C++/CLI. Make sure that this code is compiled with #pragma unmanaged in effect so it gets the full optimizer love. There's a cost associated with switching from managed code to native code execution so do make sure the execution time of the native code is substantial enough. This is otherwise not necessarily easy to do and you certainly won't have a guarantee for success. Albeit that knowing you are done can save you a lot of time stumbling into dead alleys.

answered Oct 17 '22 06:10

Hans Passant

Related questions
                            
                                DateTime Comparison Precision
                            
                                C# WPF Combobox select first item
                            
                                An error occurred while trying to restore packages. Please try again
                            
                                How to get a list of all routes in ASP.NET Core?
                            
                                How do I convert Unicode escape sequences to Unicode characters in a .NET string?
                            
                                Format .NET DateTime "Day" with no leading zero
                            
                                C#: Writing a CookieContainer to Disk and Loading Back In For Use
                            
                                How to Execute Page_Load() in Page's Base Class?
                            
                                FileNotFoundException in ApplicationSettingsBase
                            
                                how to capture the '#' character on different locale keyboards in WPF/C#?
                            
                                Avoid giving namespace name in Type.GetType()
                            
                                Reading a date from xlsx using open xml sdk
                            
                                Entity Framework Code First - Changing a Table Name
                            
                                HTTPError Exception Message not displaying when webapi is run on Server vs being run locally
                            
                                Stop displaying entire stack trace in WebAPI
                            
                                Not able to reference Image source with relative path in xaml
                            
                                Ref parameters and reflection
                            
                                Pass and execute delegate in separate AppDomain
                            
                                How can I loop through Items in the Item Template from an asp:Repeater?
                            
                                Why double.TryParse("0.0000", out doubleValue) returns false ?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What optimization hints can I give to the compiler/JIT?

Tags:

c#

.net

optimization

vb.net

BlueRaja - Danny Pflughoeft

People also ask

2 Answers

atlaste

Hans Passant

Recent Activity

Donate For Us