Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Why are 1000 threads faster than a few?

I have a simple program that searches linearly in an array of 2D points. I do 1000 searches into an array of 1 000 000 points.

The curious thing is that if I spawn 1000 threads, the program works as fast as when I span only as much as CPU cores I have, or when I use Parallel.For. This is contrary to everything I know about creating threads. Creating and destroying threads is expensive, but obviously not in this case.

Can someone explain why?

Note: this is a methodological example; the search algorithm is deliberately not meant do to optimal. The focus is on threading.

Note 2: I tested on an 4-core i7 and 3-core AMD, the results follow the same pattern!

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;

/// <summary>
/// We search for closest points.
/// For every point in array searchData, we search into inputData for the closest point, 
/// and store it at the same position into array resultData;
/// </summary>
class Program
    class Point
        public double X { get; set; }
        public double Y { get; set; }

        public double GetDistanceFrom (Point p)
            double dx, dy;
            dx = p.X - X;
            dy = p.Y - Y;
            return Math.Sqrt(dx * dx + dy * dy);

    const int inputDataSize = 1_000_000;
    static Point[] inputData = new Point[inputDataSize];

    const int searchDataSize = 1000;
    static Point[] searchData = new Point[searchDataSize];
    static Point[] resultData = new Point[searchDataSize];

    static void GenerateRandomData (Point[] array)
        Random rand = new Random();
        for (int i = 0; i < array.Length; i++)
            array[i] = new Point()
                X = rand.NextDouble() * 100_000,
                Y = rand.NextDouble() * 100_000

    private static void SearchOne(int i)
        var searchPoint = searchData[i];
        foreach (var p in inputData)
            if (resultData[i] == null)
                resultData[i] = p;
                double oldDistance = searchPoint.GetDistanceFrom(resultData[i]);
                double newDistance = searchPoint.GetDistanceFrom(p);
                if (newDistance < oldDistance)
                    resultData[i] = p;

    static void AllThreadSearch()
        List<Thread> threads = new List<Thread>();
        for (int i = 0; i < searchDataSize; i++)
            var thread = new Thread(
                obj =>
                    int index = (int)obj;
        foreach (var t in threads) t.Join();

    static void FewThreadSearch()
        int threadCount = Environment.ProcessorCount;
        int workSize = searchDataSize / threadCount;
        List<Thread> threads = new List<Thread>();
        for (int i = 0; i < threadCount; i++)
            var thread = new Thread(
                obj =>
                    int[] range = (int[])obj;
                    int from = range[0];
                    int to = range[1];
                    for (int index = from; index < to; index++)
            int rangeFrom = workSize * i;
            int rangeTo = workSize * (i + 1);
            thread.Start(new int[]{ rangeFrom, rangeTo });
        foreach (var t in threads) t.Join();

    static void ParallelThreadSearch()
        System.Threading.Tasks.Parallel.For (0, searchDataSize, 
                index =>

    static void Main(string[] args)
        Console.Write("Generatic data...  ");

        Stopwatch watch = new Stopwatch();

        Console.Write("All thread searching... ");
        Console.WriteLine($"Done in {watch.ElapsedMilliseconds} ms.");

        Console.Write("Few thread searching... ");
        Console.WriteLine($"Done in {watch.ElapsedMilliseconds} ms.");

        Console.Write("Parallel thread searching... ");
        Console.WriteLine($"Done in {watch.ElapsedMilliseconds} ms.");

        Console.WriteLine("Press ENTER to quit.");

EDIT: Please make sure to run the app outside the debugger. VS Debugger slows down the case of multiple threads.

EDIT 2: Some more tests.

To make it clear, here is updated code that guarantees we do have 1000 running at once:

public static void AllThreadSearch()
    ManualResetEvent startEvent = new ManualResetEvent(false);
    List<Thread> threads = new List<Thread>();
    for (int i = 0; i < searchDataSize; i++)
        var thread = new Thread(
        obj =>
            int index = (int)obj;
    foreach (var t in threads) t.Join();

Testing with a smaller array - 100K elements, the results are:

1000 vs 8 threads

               Method |     Mean |    Error |    StdDev | Scaled |
--------------------- |---------:|---------:|----------:|-------:|
      AllThreadSearch | 323.0 ms | 7.307 ms | 21.546 ms |   1.00 |
      FewThreadSearch | 164.9 ms | 3.311 ms |  5.251 ms |   1.00 |
 ParallelThreadSearch | 141.3 ms | 1.503 ms |  1.406 ms |   1.00 |

Now, 1000 threads is much slower, as expected. Parallel.For still bests them all, which is also logical.

However, growing the array to 500K (i.e. the amount of work every thread does), things start to look weird:

1000 vs 8, 500K

               Method |     Mean |    Error |   StdDev | Scaled |
--------------------- |---------:|---------:|---------:|-------:|
      AllThreadSearch | 890.9 ms | 17.74 ms | 30.61 ms |   1.00 |
      FewThreadSearch | 712.0 ms | 13.97 ms | 20.91 ms |   1.00 |
 ParallelThreadSearch | 714.5 ms | 13.75 ms | 12.19 ms |   1.00 |

Looks like context-switching has negligible costs. Thread-creation costs are also relatively small. The only significant cost of having too many threads is loss of memory (memory addresses). Which, alone, is bad enough.

Now, are thread-creation costs that little indeed? We've been universally told that creating threads is very bad and context-switches are evil.

like image 863
Nick Avatar asked Oct 23 '18 09:10


People also ask

Does more threads mean faster?

First of all, threads cannot speed up execution of code. They do not make the computer run faster. All they can do is increase the efficiency of the computer by using time that would otherwise be wasted.

What happens if there are too many threads?

Thus software threads tend to evict each other's data, and the cache fighting from too many threads can hurt performance. A similar overhead, at a different level, is thrashing virtual memory. Most computers use virtual memory.

What is the correlation between number of threads and program performance?

Having less threads than CPUs can mean you are not using all the CPUs in your system. Having more threads might improve throughput if CPU is your bottleneck. Having more threads than CPU does introduce an overhead and if CPU is your bottleneck this can hurt performance.

Does thread increase performance?

You can also use threads to improve appeared performance (or responsiveness) in an interactive application. You run heavy computations on a background thread to avoid blocking UI interactions.

Why is multi-threading slower than single threading?

There are a variety of reasons why multi-threading would be slower than single-threading. In many ways, the question is: why do you think multi-threading would be faster? Using multiple threads introduces extra work. Swapping contexts, to switch back and forth between threads, is extra work.

Is it better to have more threads or more threads?

For this kind of problem, 2 threads will be close to twice as fast as 1 thread, and 4 threads will be close to twice as fast as 2 threads. If the answer is “no” then more threads won’t help. What two real-world applications in which multi-threading does not provide better performance than a single-threaded solution?

Can a process contain more than one thread?

A process can contain multiple threads. Why Multithreading? A thread is also known as lightweight process. The idea is to achieve parallelism by dividing a process into multiple threads. For example, in a browser, multiple tabs can be different threads.

What is multi-threaded and single threaded in C++?

Programs that have multiple threads are called multi-threaded programs. A program that has only one main thread is called a single-threaded program. The main thread is responsible for executing all the code of the program (UI presentation and refresh, network requests, local storage, etc.).

1 Answers

You may want to consider how the application is accessing memory. In the maximum threads scenario you are effectively accessing memory sequentially, which is efficient from a caching point of view. The approach using a small number of threads is more random, causing cache misses. Depending on the CPU there are performance counters that allow you to measure L1 and L2 cache hits/misses.

like image 132
open-collar Avatar answered Nov 15 '22 20:11
