Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.NET concurrency performance on the client side

I am writing a client-side .NET application which is expected to use a lot of threads. I was warned that .NET performance is very bad when it comes to concurrency. While I am not writing a real-time application, I want to make sure my application is scalable (i.e. allows many threads) and is somehow comparable to an equivalent C++ application.

What is your experience? What is a relevant benchmark?

like image 935
Yaron Naveh Avatar asked Nov 27 '22 02:11

Yaron Naveh


1 Answers

I threw together a quick-and-dirty benchmark in C# using a prime generator as a test. The test generates primes up to a constant limit (I chose 500000) using a simple Sieve of Eratosthenes implementation and repeats the test 800 times, parallelized over a specific number of threads, either using the .NET ThreadPool or standalone threads.

The test was run on a Quad-Core Q6600 running Windows Vista (x64). This is not using the Task Parallel Library, just simple threads. It was run for the following scenarios:

  • Serial execution (no threading)
  • 4 threads (i.e. one per core), using the ThreadPool
  • 40 threads using the ThreadPool (to test the efficiency of the pool itself)
  • 4 standalone threads
  • 40 standalone threads, to simulate context-switching pressure

The results were:

Test | Threads | ThreadPool | Time
-----+---------+------------+--------
1    | 1       | False      | 00:00:17.9508817
2    | 4       | True       | 00:00:05.1382026
3    | 40      | True       | 00:00:05.3699521
4    | 4       | False      | 00:00:05.2591492
5    | 40      | False      | 00:00:05.0976274

Conclusions one can draw from this:

  • Parallelization isn't perfect (as expected - it never is, no matter the environment), but splitting the load across 4 cores results in about 3.5x more throughput, which is hardly anything to complain about.

  • There was negligible difference between 4 and 40 threads using the ThreadPool, which means that no significant expense is incurred with the pool, even when you bombard it with requests.

  • There was negligible difference between the ThreadPool and free-threaded versions, which means that the ThreadPool does not have any significant "constant" expense;

  • There was negligible difference between the 4-thread and 40-thread free-threaded versions, which means that .NET doesn't perform any worse than one would expect it to with heavy context-switching.

Do we even need a C++ benchmark to compare to? The results are pretty clear: Threads in .NET are not slow. Unless you, the programmer, write poor multi-threading code and end up with resource starvation or lock convoys, you really don't need to worry.

With .NET 4.0 and the TPL and improvements to the ThreadPool, work-stealing queues and all that cool stuff, you have even more leeway to write "questionable" code and still have it run efficiently. You don't get these features at all from C++.

For reference, here is the test code:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Threading;

namespace ThreadingTest
{
    class Program
    {
        private static int PrimeMax = 500000;
        private static int TestRunCount = 800;

        static void Main(string[] args)
        {
            Console.WriteLine("Test | Threads | ThreadPool | Time");
            Console.WriteLine("-----+---------+------------+--------");
            RunTest(1, 1, false);
            RunTest(2, 4, true);
            RunTest(3, 40, true);
            RunTest(4, 4, false);
            RunTest(5, 40, false);
            Console.WriteLine("Done!");
            Console.ReadLine();
        }

        static void RunTest(int sequence, int threadCount, bool useThreadPool)
        {
            TimeSpan duration = Time(() => GeneratePrimes(threadCount, useThreadPool));
            Console.WriteLine("{0} | {1} | {2} | {3}",
                sequence.ToString().PadRight(4),
                threadCount.ToString().PadRight(7),
                useThreadPool.ToString().PadRight(10),
                duration);
        }

        static TimeSpan Time(Action action)
        {
            Stopwatch sw = new Stopwatch();
            sw.Start();
            action();
            sw.Stop();
            return sw.Elapsed;
        }

        static void GeneratePrimes(int threadCount, bool useThreadPool)
        {
            if (threadCount == 1)
            {
                TestPrimes(TestRunCount);
                return;
            }

            int testsPerThread = TestRunCount / threadCount;
            int remaining = threadCount;
            using (ManualResetEvent finishedEvent = new ManualResetEvent(false))
            {
                for (int i = 0; i < threadCount; i++)
                {
                    Action testAction = () =>
                    {
                        TestPrimes(testsPerThread);
                        if (Interlocked.Decrement(ref remaining) == 0)
                        {
                            finishedEvent.Set();
                        }
                    };

                    if (useThreadPool)
                    {
                        ThreadPool.QueueUserWorkItem(s => testAction());
                    }
                    else
                    {
                        ThreadStart ts = new ThreadStart(testAction);
                        Thread th = new Thread(ts);
                        th.Start();
                    }
                }
                finishedEvent.WaitOne();
            }
        }

        [MethodImpl(MethodImplOptions.NoOptimization)]
        static void IteratePrimes(IEnumerable<int> primes)
        {
            int count = 0;
            foreach (int prime in primes) { count++; }
        }

        static void TestPrimes(int testRuns)
        {
            for (int t = 0; t < testRuns; t++)
            {
                var primes = Primes.GenerateUpTo(PrimeMax);
                IteratePrimes(primes);
            }
        }
    }
}

And here is the prime generator:

using System;
using System.Collections.Generic;
using System.Linq;

namespace ThreadingTest
{
    public class Primes
    {
        public static IEnumerable<int> GenerateUpTo(int maxValue)
        {
            if (maxValue < 2)
                return Enumerable.Empty<int>();

            bool[] primes = new bool[maxValue + 1];
            for (int i = 2; i <= maxValue; i++)
                primes[i] = true;

            for (int i = 2; i < Math.Sqrt(maxValue + 1) + 1; i++)
            {
                if (primes[i])
                {
                    for (int j = i * i; j <= maxValue; j += i)
                        primes[j] = false;
                }
            }

            return Enumerable.Range(2, maxValue - 1).Where(i => primes[i]);
        }
    }
}

If you see any obvious flaws in the test, let me know. Barring any serious problems with the test itself, I think the results speak for themselves, and the message is clear:

Don't listen to anyone who makes overly broad and unqualified statements about how the performance of .NET or any other language/environment is "bad" in some particular area, because they are probably talking out of their... rear ends.

like image 179
Aaronaught Avatar answered Dec 04 '22 01:12

Aaronaught