Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#.

Apart from needing to add an extra range check on the left/right variables before the recursive call, it appears to work quite well.

I knew beforehand that the framework provides a built-in Quicksort implementation in List<>.Sort (via Array.Sort). So I tried some basic profiling to compare performance. Results: The built-in List<>.Sort method, operating on the same lists, performs about 10 times faster than my own manual implementation.

Using reflector, I found that the actual sorting in List<>.Sort is implemented in external code, not IL (in a function named tryszsort()).

Looking at my own Quicksort implementation I would expect that replacing the recursive calls with iteration might give some improvement. Also, disabling array bounds checking (if possible) could also give some benefits. Maybe this would get some way closer to the built-in implementation but I'm not confident.

So my question: Is it realistic to expect performance in an optimised algorithm (written in .NET IL, jitted to native code) can compete with performance of an externally implemented algorithm?

Once again, I realise Quicksort is provided as part of the framework, this was just a learning experience for me. However there are also many algorithms (CRC32 comes to mind) that are not provided, but still could be of much value to many applications. Here's a related question regarding implementing CRC32 in .NET and performance issues.

So if you need to implement such an algorithm in .NET, what are the major performance considerations to understand, so that your algorithm can at least approach the performance of external code?

[Update]

I have improved execution speed to within about 10% of the built in Array.Sort by changing the algorithm to operate on a simple array of Int, instead of List. In Reflector, I can see this avoids a Callvirt() operation on every get or set on the list. I thought this might improve things, but I'm surprised by how much.

like image 440
Ash Avatar asked Nov 13 '09 05:11

Ash


3 Answers

By using non-recursive code and, especially, by using "unsafe" blocks and pointer arithmetic (if applicable) you could actually see a x5 or x10 performance gain with an algorithm written in C#. As always with performance (and even more when dealing with a managed environment), you never quite know until you try it and benchmark it.

Now, generally speaking, you should mostly write stuff in C#, then optimize it, optimize it some more, and if it's still not good enough, identify the exact critical piece of code and port it to C++ (while being careful about limiting the number of managed/native call boundaries).

like image 77
Ludovic Chabant Avatar answered Oct 23 '22 16:10

Ludovic Chabant


Just out of curiosity, as despite my 9 years of experience with .NET, I still constantly make this mistake: Did you compile your code in Release mode with optimizations on? Debug code performs significantly worse than optimized release code.

Assuming you DID compile in release mode, there shouldn't be a huge difference in performance if you implement the algorithm similarly (i.e. iterative vs. iterative or recursive vs. recursive). If you wish to see the .NET implementation and figure out, you can download the SSCLI, Share-Source Common Language Infrastructure. This is Microsoft's publicly availble ECMA-standard compliant CLI implementation. It is not 100% of the .NET framework we all know and love, but it is a significant portion of it. It can provide a lot of information that Reflector can't, including internal implementations. All types of code is available, including C#, C++, and even some Assembler in a couple cases.

like image 26
jrista Avatar answered Oct 23 '22 16:10

jrista


Make sure you're comparing apples and apples.

When sorting, it's possible for the Compare function to be dominant, and that might differ between the implementations.

Assuming the Compare function in both cases is fast enough not to be an issue, then the time can be dominated by stuff like array bounds checking, which can easily make a big difference.

like image 25
Mike Dunlavey Avatar answered Oct 23 '22 15:10

Mike Dunlavey