Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

My 32 bit headache is now a 64bit migraine?!? (or 64bit .NET CLR Runtime issues)

Tags:

c#

.net

vb.net

clr

jit

What unusual, unexpected consequences have occurred in terms of performance, memory, etc when switching from running your .NET applications under the 64 bit JIT vs. the 32 bit JIT? I'm interested in the good, but more interested in the surprisingly bad issues people have run into.

I am in the process of writing a new .NET application which will be deployed in both 32bit and 64bit. There have been many questions relating to the issues with porting the application - I am unconcerned with the "gotchas" from a programming/porting standpoint. (ie: Handling native/COM interop correctly, reference types embedded in structs changing the size of the struct, etc.)

However, this question and it's answer got me thinking - What other issues am I overlooking?

There have been many questions and blog posts that skirt around this issue, or hit one aspect of it, but I haven't seen anything that's compiled a decent list of problems.

In particular - My application is very CPU bound and has huge memory usage patterns (hence the need for 64bit in the first place), as well as being graphical in nature. I'm concerned with what other hidden issues may exist in the CLR or JIT running on 64 bit Windows (using .NET 3.5sp1).

Here are a few issues I'm currently aware of:

  • (Now I know that) Properties, even automatic properties, don't get inlined in x64.
  • The memory profile of the application changes, both because of the size of references, but also because the memory allocator has different performance characteristics
  • Startup times can suffer on x64

I'd like to know what other, specific, issues people have discovered in the JIT on 64bit Windows, and also if there are any workarounds for performance.

Thank you all!

----EDIT-----

Just to clarify -

I am aware that trying to optimize early is often bad. I am aware that second guessing the system is often bad. I also know that portability to 64bit has its own issues - we run and test on 64bit systems daily to help with this. etc.

My application, however, is not your typical business application. It's a scientific software application. We have many processes that sit using 100% CPU on all of the cores (it's highly threaded) for hours at a time.

I spend a LOT of time profiling the application, and that makes a huge difference. However, most profilers disable many features of the JIT, so the small details in things like memory allocation, inlining in the JIT, etc, can be very difficult to pin down when you're running under a profiler. Hence my need for the question.

like image 360
Reed Copsey Avatar asked Mar 11 '09 15:03

Reed Copsey


2 Answers

A particularly troublesome performance problem in .NET relates to the poor JIT:

https://connect.microsoft.com/VisualStudio/feedback/details/93858/struct-methods-should-be-inlined?wa=wsignin1.0

Basically, inlining and structs don't work well together on x64 (although that page suggests inlining now works but subsequent redunant copies aren't eliminated, that sounds suspect given the tiny perf. difference).

In any case, after wrestling with .NET long enough for this, my solution is to use C++ for anything numerically intensive. Even in "good" cases for .NET, where you're not dealing with structs and using arrays where the bounds-checking is optimized out, C++ beats .NET hands down.

If you're doing anything more complicated than dot products, the picture gets worse very quickly; the .NET code is both longer + less readable (because you need to manually inline stuff and/or can't use generics), and much slower.

I've switched to using Eigen in C++: it's absolutely great, resulting in readable code and high performance; a thin C++/CLI wrapper then provides the glue between the compute engine and the .NET world.

Eigen works by template meta-programming; in compiles vector-expressions into SSE intrinsic instructions and does a lot of the nastiest cache-related loop unrolling and rearranging for you; and though focused on linear algebra, it'll work with integers and non-matrix array expressions too.

So, for instance, if P is a matrix, this kind of stuff Just Works:

1.0 /  (P.transpose() * P).diagonal().sum();

...which doesn't allocate a temporarily transposed variant of P, and doesn't compute the whole matrix product but only the fields it needs.

So, if you can run in Full Trust - just use C++ via C++/CLI, it works much much better.

like image 200
Eamon Nerbonne Avatar answered Nov 17 '22 19:11

Eamon Nerbonne


I remember hearing an issue from an IRC channel I frequent. It optimises away the temporary copy in this instance:

EventHandler temp = SomeEvent;
if(temp != null)
{
    temp(this, EventArgs.Empty);
}

Putting the race condition back in and causing potential null reference exceptions.

like image 3
Quibblesome Avatar answered Nov 17 '22 21:11

Quibblesome