I'm doing a project that spawn some hundreds of threads. All these threads are in a "sleeping" condition (they are locked on a Monitor object). I have noticed that if I increase the number of "sleeping" threads the program slow down very much. The "funny" thing is that looking at the Task Manager it seems that the greater the number of threads, the more free is the processor. I have narrowed the problem to object creation.
Can someone explain it to me?
I have produced a small sample to test it. It's a console program. It creates a thread for each processor and measure it's speed with a simple test (a "new Object()" ). No, the "new Object()" isn't jitted away (try if you don't trust me). The main thread show the speed of each thread. Pressing CTRL-C, the program spawns 50 "sleeping" threads. The slow down begins with just 50 threads. With around 250 it's very visible on the Task Manager that the CPU isn't 100% used (on mine it's 82%).
I have tried three methods of locking the "sleeping" thread: Thread.CurrentThread.Suspend() (bad, bad, I know :-) ), a lock on an already locked object and a Thread.Sleep(Timeout.Infinite). It's the same. If I comment the row with the new Object(), and I replace it with a Math.Sqrt (or with nothing) the problem isn't present. The speed doesn't change with the number of threads. Can someone else check it? Does anyone knows where is the bottle neck?
Ah... you should test it in Release Mode WITHOUT launching it from the Visual Studio. I'm using XP sp3 on a dual processor (no HT). I have tested it with the .NET 3.5 and 4.0 (to test the different framework runtimes)
namespace TestSpeed
{
using System;
using System.Collections.Generic;
using System.Threading;
class Program
{
private const long ticksInSec = 10000000;
private const long ticksInMs = ticksInSec / 1000;
private const int threadsTime = 50;
private const int stackSizeBytes = 256 * 1024;
private const int waitTimeMs = 1000;
private static List<int> collects = new List<int>();
private static int[] objsCreated;
static void Main(string[] args)
{
objsCreated = new int[Environment.ProcessorCount];
Monitor.Enter(objsCreated);
for (int i = 0; i < objsCreated.Length; i++)
{
new Thread(Worker).Start(i);
}
int[] oldCount = new int[objsCreated.Length];
DateTime last = DateTime.UtcNow;
Console.Clear();
int numThreads = 0;
Console.WriteLine("Press Ctrl-C to generate {0} sleeping threads, Ctrl-Break to end.", threadsTime);
Console.CancelKeyPress += (sender, e) =>
{
if (e.SpecialKey != ConsoleSpecialKey.ControlC)
{
return;
}
for (int i = 0; i < threadsTime; i++)
{
new Thread(() =>
{
/* The same for all the three "ways" to lock forever a thread */
//Thread.CurrentThread.Suspend();
//Thread.Sleep(Timeout.Infinite);
lock (objsCreated) { }
}, stackSizeBytes).Start();
Interlocked.Increment(ref numThreads);
}
e.Cancel = true;
};
while (true)
{
Thread.Sleep(waitTimeMs);
Console.SetCursorPosition(0, 1);
DateTime now = DateTime.UtcNow;
long ticks = (now - last).Ticks;
Console.WriteLine("Slept for {0}ms", ticks / ticksInMs);
Thread.MemoryBarrier();
for (int i = 0; i < objsCreated.Length; i++)
{
int count = objsCreated[i];
Console.WriteLine("{0} [{1} Threads]: {2}/sec ", i, numThreads, ((long)(count - oldCount[i])) * ticksInSec / ticks);
oldCount[i] = count;
}
Console.WriteLine();
CheckCollects();
last = now;
}
}
private static void Worker(object obj)
{
int ix = (int)obj;
while (true)
{
/* First and second are slowed by threads, third, fourth, fifth and "nothing" aren't*/
new Object();
//if (new Object().Equals(null)) return;
//Math.Sqrt(objsCreated[ix]);
//if (Math.Sqrt(objsCreated[ix]) < 0) return;
//Interlocked.Add(ref objsCreated[ix], 0);
Interlocked.Increment(ref objsCreated[ix]);
}
}
private static void CheckCollects()
{
int newMax = GC.MaxGeneration;
while (newMax > collects.Count)
{
collects.Add(0);
}
for (int i = 0; i < collects.Count; i++)
{
int newCol = GC.CollectionCount(i);
if (newCol != collects[i])
{
collects[i] = newCol;
Console.WriteLine("Collect gen {0}: {1}", i, newCol);
}
}
}
}
}
Using threads can simplify the logic of the application and also take advantage of multiple processors, but creating too many threads can cause overall application performance problems due to contention for resources.
On Windows machines, there's no limit specified for threads. Thus, we can create as many threads as we want, until our system runs out of available system memory.
There is nothing in the C++ standard that limits number of threads. However, OS will certainly have a hard limit. Having too many threads decreases the throughput of your application, so it's recommended that you use a thread pool.
General rule of thumb for threading an application: 1 thread per CPU Core. On a quad core PC that means 4. As was noted, the XBox 360 however has 3 cores but 2 hardware threads each, so 6 threads in this case.
Start Taskmgr.exe, Processes tab. View + Select columns, tick "Page Fault Delta". You'll see the impact of allocating hundreds of megabytes, just to store the stacks of all these threads you created. Every time that number blips for your process, your program blocks waiting for the operating system paging in data from the disk into RAM.
TANSTAAFL, There ain't no such thing as a free lunch.
My guess is that the problem is that garbage collection requires a certain amount of cooperation between threads - something either needs to check that they're all suspended, or ask them to suspend themselves and wait for it to happen, etc. (And even if they are suspended, it has to tell them not to wake up!)
This describes a "stop the world" garbage collector, of course. I believe there are at least two or three different GC implementations which differ in the details around parallelism... but I suspect that all of them are going to have some work to do in terms of getting threads to cooperate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With