Have encounterred a situation where a simple .net fibonniacci code is slower on a particular set of servers and the only thing that is obviously different is the CPU. AMD Opteron Processor 6276 - 11 secs Intel Xeon XPU E7 - 4850 - 7 secs
Code is complied for x86 and using .NET framework 4.0. -Clock speeds between both is similar and in fact PassMark benchmarks gives highesr scores for AMD. -Have tried this on other AMD servers in the farm and the times are slower. -Even my local I7 machines runs the code faster.
Fibonnacci code:
class Program
{
static void Main(string[] args)
{
const int ITERATIONS = 10000;
const int FIBONACCI = 100000;
var watch = new Stopwatch();
watch.Start();
DoFibonnacci(ITERATIONS, FIBONACCI);
watch.Stop();
Console.WriteLine("Total fibonacci time: {0}ms", watch.ElapsedMilliseconds);
Console.ReadLine();
}
private static void DoFibonnacci(int ITERATIONS, int FIBONACCI)
{
for (int i = 0; i < ITERATIONS; i++)
{
Fibonacci(FIBONACCI);
}
}
private static int Fibonacci(int x)
{
var previousValue = -1;
var currentResult = 1;
for (var i = 0; i <= x; ++i)
{
var sum = currentResult + previousValue;
previousValue = currentResult;
currentResult = sum;
}
return currentResult;
}
}
Any ideas on what maybe going on?
As we've established in the comments, you can workaround this performance bash by pinning the process to a specific processor on the AMD Opteron machines.
Kindled by this not-really-on-topic question I decided to have a look at possible scenarios where single core pinning would make such a difference (from 11 to 7 seconds seems a bit extreme).
The most plausible answer is not that revolutionary:
The AMD Opteron series employ HyperTransport in a so-called NUMA architecture, instead of a traditional FSB as you would find on Intel's SMP CPU's (Xeon 4850 included)
My guess is that this symptom stems from the fact that individual nodes in a NUMA architecture has individual cache, as opposed to the Intel CPU, in which the processor cache is shared.
In other words, when consecutive computations shift between nodes on the Opteron, the cache is flushed, whereas balancing between processors in an SMP architecture like the Xeon 4850 has no such impact since the cache is shared.
Setting affinity in .NET is pretty easy, just pick a processor (let's just take the first one for simplicity):
static void Main(string[] args)
{
Console.WriteLine(Environment.ProcessorCount);
Console.Read();
//An AffinityMask of 0x0001 will make sure the process is always pinned to processer 0
Process thisProcess = Process.GetCurrentProcess();
thisProcess.ProcessorAffinity = (IntPtr)0x0001;
const int ITERATIONS = 10000;
const int FIBONACCI = 100000;
var watch = new Stopwatch();
watch.Start();
DoFibonnacci(ITERATIONS, FIBONACCI);
watch.Stop();
Console.WriteLine("Total fibonacci time: {0}ms", watch.ElapsedMilliseconds);
Console.ReadLine();
}
Although I'm pretty sure this is not very smart in a NUMA environment.
Windows 2008 R2 has some cool native NUMA functionality, and I found a promissing codeplex project with a .NET wrapper for this as well: http://multiproc.codeplex.com/
I'm in no way near qualified to teach you how to utilize this technology, but this should point you in the right direction.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With