I have AMD Opteron(tm) 6282SE 2.6 GHZ 32 cores (2 processors 16 core each) I have C# mathematical application which i can run on parallel cores.
The optimum performance that i get for the main part of my app is when i use 16 threads (i.e. divide the work to 16 threads)the optimal running time for this part is 1MS.
If I use more than 16 threads i get more than 1MS.
My question is why i can't i parallel this part to more threads assuming that i have 32 cores.
This is the code that run in parallel.
int N = 238;
int P = 16;
int Chunk = N / P;
AutoResetEvent signal = new AutoResetEvent(false);
// use a counter to reduce
int counter = P;
// kernel transitions
for (int c = 0; c < P; c++)
{
// for each chunk
ThreadPool.QueueUserWorkItem(delegate(Object o)
{
int lc = (int)o;
for (int i = lc * Chunk; i < (lc + 1 == P ? N : (lc + 1) * Chunk); i++)
{
// do something
}
if (Interlocked.Decrement(ref counter) == 0)
{
signal.Set();
}
}, c);
}
signal.WaitOne();
First off, I think you should definitely replace your construct with the new .NET 4.0 Parallel.For
construct:
Parallel.For(0, N,
i =>
{
// do something
});
Secondly, you are in fact using two CPUs with 16 cores each. Most likely the scheduler is smart enough to exploit locality and schedule all your 16 threads on the same CPU. When the other CPU comes into play, depending on your computation, accessing shared data needs to be passed all the way through main memory to ensure coherence between the two CPUs. This could be very costly.
ThreadPool
is reactive and it can take a while until new threads are added to the pool. Basically if there are not enough threads for sometime, it increases the thread pool size and when there are more idles again, it brings it back. So it fluctuates between min and max size set by ThreadPool object - accessible to get back or set.
If you know how many threads you need, use SetMinThreads to ensure you have enough threads at the start.
The problem was because my EXE file was compiled to 32 bit and the operating system was 64 bit.
From 64-bit Applications:
Due to the design of x86 emulation and the WOW64 subsystem for the Itanium processor family, applications are restricted to execution on one processor.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With