.NET concurrency performance on the client side

Question

I am writing a client-side .NET application which is expected to use a lot of threads. I was warned that .NET performance is very bad when it comes to concurrency. While I am not writing a real-time application, I want to make sure my application is scalable (i.e. allows many threads) and is somehow comparable to an equivalent C++ application.

What is your experience? What is a relevant benchmark?

Aaronaught · Accepted Answer

I threw together a quick-and-dirty benchmark in C# using a prime generator as a test. The test generates primes up to a constant limit (I chose 500000) using a simple Sieve of Eratosthenes implementation and repeats the test 800 times, parallelized over a specific number of threads, either using the .NET ThreadPool or standalone threads.

The test was run on a Quad-Core Q6600 running Windows Vista (x64). This is not using the Task Parallel Library, just simple threads. It was run for the following scenarios:

Serial execution (no threading)
4 threads (i.e. one per core), using the ThreadPool
40 threads using the ThreadPool (to test the efficiency of the pool itself)
4 standalone threads
40 standalone threads, to simulate context-switching pressure

The results were:

Test | Threads | ThreadPool | Time
-----+---------+------------+--------
1    | 1       | False      | 00:00:17.9508817
2    | 4       | True       | 00:00:05.1382026
3    | 40      | True       | 00:00:05.3699521
4    | 4       | False      | 00:00:05.2591492
5    | 40      | False      | 00:00:05.0976274

Conclusions one can draw from this:

Parallelization isn't perfect (as expected - it never is, no matter the environment), but splitting the load across 4 cores results in about 3.5x more throughput, which is hardly anything to complain about.
There was negligible difference between 4 and 40 threads using the ThreadPool, which means that no significant expense is incurred with the pool, even when you bombard it with requests.
There was negligible difference between the ThreadPool and free-threaded versions, which means that the ThreadPool does not have any significant "constant" expense;
There was negligible difference between the 4-thread and 40-thread free-threaded versions, which means that .NET doesn't perform any worse than one would expect it to with heavy context-switching.

Do we even need a C++ benchmark to compare to? The results are pretty clear: Threads in .NET are not slow. Unless you, the programmer, write poor multi-threading code and end up with resource starvation or lock convoys, you really don't need to worry.

With .NET 4.0 and the TPL and improvements to the ThreadPool, work-stealing queues and all that cool stuff, you have even more leeway to write "questionable" code and still have it run efficiently. You don't get these features at all from C++.

For reference, here is the test code:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Threading;

namespace ThreadingTest
{
    class Program
    {
        private static int PrimeMax = 500000;
        private static int TestRunCount = 800;

        static void Main(string[] args)
        {
            Console.WriteLine("Test | Threads | ThreadPool | Time");
            Console.WriteLine("-----+---------+------------+--------");
            RunTest(1, 1, false);
            RunTest(2, 4, true);
            RunTest(3, 40, true);
            RunTest(4, 4, false);
            RunTest(5, 40, false);
            Console.WriteLine("Done!");
            Console.ReadLine();
        }

        static void RunTest(int sequence, int threadCount, bool useThreadPool)
        {
            TimeSpan duration = Time(() => GeneratePrimes(threadCount, useThreadPool));
            Console.WriteLine("{0} | {1} | {2} | {3}",
                sequence.ToString().PadRight(4),
                threadCount.ToString().PadRight(7),
                useThreadPool.ToString().PadRight(10),
                duration);
        }

        static TimeSpan Time(Action action)
        {
            Stopwatch sw = new Stopwatch();
            sw.Start();
            action();
            sw.Stop();
            return sw.Elapsed;
        }

        static void GeneratePrimes(int threadCount, bool useThreadPool)
        {
            if (threadCount == 1)
            {
                TestPrimes(TestRunCount);
                return;
            }

            int testsPerThread = TestRunCount / threadCount;
            int remaining = threadCount;
            using (ManualResetEvent finishedEvent = new ManualResetEvent(false))
            {
                for (int i = 0; i < threadCount; i++)
                {
                    Action testAction = () =>
                    {
                        TestPrimes(testsPerThread);
                        if (Interlocked.Decrement(ref remaining) == 0)
                        {
                            finishedEvent.Set();
                        }
                    };

                    if (useThreadPool)
                    {
                        ThreadPool.QueueUserWorkItem(s => testAction());
                    }
                    else
                    {
                        ThreadStart ts = new ThreadStart(testAction);
                        Thread th = new Thread(ts);
                        th.Start();
                    }
                }
                finishedEvent.WaitOne();
            }
        }

        [MethodImpl(MethodImplOptions.NoOptimization)]
        static void IteratePrimes(IEnumerable<int> primes)
        {
            int count = 0;
            foreach (int prime in primes) { count++; }
        }

        static void TestPrimes(int testRuns)
        {
            for (int t = 0; t < testRuns; t++)
            {
                var primes = Primes.GenerateUpTo(PrimeMax);
                IteratePrimes(primes);
            }
        }
    }
}

And here is the prime generator:

using System;
using System.Collections.Generic;
using System.Linq;

namespace ThreadingTest
{
    public class Primes
    {
        public static IEnumerable<int> GenerateUpTo(int maxValue)
        {
            if (maxValue < 2)
                return Enumerable.Empty<int>();

            bool[] primes = new bool[maxValue + 1];
            for (int i = 2; i <= maxValue; i++)
                primes[i] = true;

            for (int i = 2; i < Math.Sqrt(maxValue + 1) + 1; i++)
            {
                if (primes[i])
                {
                    for (int j = i * i; j <= maxValue; j += i)
                        primes[j] = false;
                }
            }

            return Enumerable.Range(2, maxValue - 1).Where(i => primes[i]);
        }
    }
}

If you see any obvious flaws in the test, let me know. Barring any serious problems with the test itself, I think the results speak for themselves, and the message is clear:

Don't listen to anyone who makes overly broad and unqualified statements about how the performance of .NET or any other language/environment is "bad" in some particular area, because they are probably talking out of their... rear ends.

.NET concurrency performance on the client side

Tags:

performance

.net

multithreading

concurrency

scalability

Yaron Naveh

1 Answers

Aaronaught

Recent Activity

Donate For Us

.NET concurrency performance on the client side

Tags:

performance

.net

multithreading

concurrency

scalability

Yaron Naveh

1 Answers

Aaronaught

Related questions

Recent Activity

Donate For Us