I have a simple program which starts n threads and create some load on each thread. If i only start one thread, one core gets about 100% load. If i start one process with 16 threads(which means one thread per core), i only get about 80% load. If i start 8 processes with 2 threads(which still means one thread per core), i get about 99% load. I don't use any locking in this sample.
What is the reason for this behavior? I understand that the load goes down if there a 100 threads working because the OS has to schedule a lot. But in this case there are only as many threads as cores.
It is even worse(for me at least). If i add a simple thread.sleep(0) in my loop, the load with one process and 16 threads increase up to 95%.
Can anyone answer this, or provide a link with more information about this specific topic?
//Sample application which reads the number of threads to be started from Console.ReadLine
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Enter the number of threads to be started");
int numberOfThreadsToStart;
string input = Console.ReadLine();
int.TryParse(input, out numberOfThreadsToStart);
if(numberOfThreadsToStart < 1)
{
Console.WriteLine("No valid number of threads entered. Exit now");
Thread.Sleep(1500);
return;
}
List<Thread> threadList = new List<Thread>();
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < numberOfThreadsToStart; i++)
{
Thread workerThread = new Thread(MakeSomeLoad);
workerThread.Start();
threadList.Add(workerThread);
}
while (true)
{
Console.WriteLine("I'm spinning... ");
Thread.Sleep(2000);
}
}
static void MakeSomeLoad()
{
for (int i = 0; i < 100000000; i++)
{
for (int j = 0; j < i; j++)
{
//uncomment the following line to increase the load
//Thread.Sleep(0);
StringBuilder sb = new StringBuilder();
sb.Append("hello world" + j);
}
}
}
}
Your test looks very GC heavy. If you have 16 threads in one process, the GC will run more in that process, and since the client GC isn't parallel, this leads to a lower load. i.e. you have 16 garbage producing threads per GC thread.
On the other hand if you run 8 processes with two threads each, you get only two threads producing garbage for each GC thread, and the GC can work in parallel between those processes.
If you write a test that produces less garbage, and uses more CPU directly, you will likely get different results.
(Note that this is only speculation, I didn't run your test, and since I only have a dual core CPU that would be different from your results anyways)
Something else to consider is that there are different modes to the garbage collector:
You can find some of the graphic details of each here.
Since you process is using lots of threads and is allocating a whole lot of memory, you should try server GC.
The server GC is optimized for high throughput and high scalability in server applications where there is a consistent load and requests are allocating and deallocating memory at a high rate. The server GC uses one heap and one GC thread per processor and tries to balance the heaps as much as possible. At the time of a garbage collection, the GC threads work on their respective threads and rendez-vous at certain points. Since they all work on their own heaps, minimal locking etc. is needed which makes it very efficient in this type of situation.
You enable the Server CG in your App.config:
<configuration>
<runtime>
<gcServer enabled="true" />
</runtime>
</configuration>
Note that this will only work on a multi processor (or core) system. If windows reports only one processor then you will get Workstation GC – Non Concurrent instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With