understanding Parallel.Invoke, creation and reusing of threads

Question

I am trying to understand how Parallel.Invoke creates and reuses threads. I ran the following example code (from MSDN, https://msdn.microsoft.com/en-us/library/dd642243(v=vs.110).aspx):

using System;
using System.Threading;
using System.Threading.Tasks;

class ThreadLocalDemo
{
        static void Main()
        {
            // Thread-Local variable that yields a name for a thread
            ThreadLocal<string> ThreadName = new ThreadLocal<string>(() =>
            {
                return "Thread" + Thread.CurrentThread.ManagedThreadId;
            });

            // Action that prints out ThreadName for the current thread
            Action action = () =>
            {
                // If ThreadName.IsValueCreated is true, it means that we are not the
                // first action to run on this thread.
                bool repeat = ThreadName.IsValueCreated;

                Console.WriteLine("ThreadName = {0} {1}", ThreadName.Value, repeat ? "(repeat)" : "");
            };

            // Launch eight of them. On 4 cores or less, you should see some repeat ThreadNames
            Parallel.Invoke(action, action, action, action, action, action, action, action);

            // Dispose when you are done
            ThreadName.Dispose();
        }
}

As I understand it, Parallel.Invoke tries to create 8 threads here - one for each action. So it creates the first thread, runs the first action, and by that gives a ThreadName to the thread. Then it creates the next thread (which gets a different ThreadName) and so on.

If it cannot create a new thread, it will reuse one of the threads created before. In this case, the value of repeat will be true and we can see this in the console output.

Is this correct until here?

The second-last comment ("Launch eight of them. On 4 cores or less, you should see some repeat ThreadNames") implies that the threads created by Invoke correspond to the available cpu threads of the processor: on 4 cores we have 8 cpu threads, at least one is busy (running the operating system and stuff), so Invoke can only use 7 different threads, so we must get at least one "repeat".

Is my interpretation of this comment correct?

I ran this code on my PC which has an Intel® Core™ i7-2860QM processor (i.e. 4 cores, 8 cpu threads). I expected to get at least one "repeat", but I didn't. When I changed the Invoke to take 10 instead of 8 actions, I got this output:

ThreadName = Thread6
ThreadName = Thread8
ThreadName = Thread6 (repeat)
ThreadName = Thread5
ThreadName = Thread3
ThreadName = Thread1
ThreadName = Thread10
ThreadName = Thread7
ThreadName = Thread4
ThreadName = Thread9

So I have at least 9 different threads in the console application. This contradicts the fact that my processor only has 8 threads.

So I guess some of my reasoning from above is wrong. Does Parallel.Invoke work differently than what I described above? If yes, how?

Evk · Accepted Answer

If you pass less then 10 items to Parallel.Invoke, and you don't specify MaxDegreeOfParallelism in options (so - your case), it will just run them all in parallel on thread pool sheduler using rougly the following code:

var actions = new [] { action, action, action, action, action, action, action, action };
var tasks = new Task[actions.Length];
for (int index = 1; index < tasks.Length; ++index)
    tasks[index] = Task.Factory.StartNew(actions[index]);
tasks[0] = new Task(actions[0]);
tasks[0].RunSynchronously();
Task.WaitAll(tasks);

So just a regular Task.Factory.StartNew. If you will look at max number of threads in thread pool

int th, io;
ThreadPool.GetMaxThreads(out th, out io);
Console.WriteLine(th);

You will see some big number, like 32767. So, number of threads on which Parallel.Invoke will be executed (in your case) are not limited to number of cpu cores at all. Even on 1-core cpu it might run 8 threads in parallel.

You might then think, why some threads are reused at all? Because when work is done on thread pool thread - that thread is returned to the pool and is ready to accept new work. Actions from your example basically do no work at all and complete very fast. So sometimes first thread started via Task.Factory.StartNew has already completed your action and is returned to the pool before all subsequent threads were started. So that thread is reused.

By the way, you can see (repeat) in your example with 8 actions, and even with 7 if you try hard enough, on a 8 core (16 logical cores) processor.

UPDATE to answer your comment. Thread pool scheduler will not necessary create new threads immediately. There is min and max number of threads in thread pool. How to see max I already shown above. To see min number:

int th, io;
ThreadPool.GetMinThreads(out th, out io);

This number will usually be equal to the number of cores (so for example 8). Now, when you request new action to be performed on thread pool thread, and number of threads in a thread pool is less than minimum - new thread will be created immeditely. However, if number of available threads is greater than minimum - certain delay will be introduced before creating new thread (I don't remember how long exactly unfortunately, about 500ms).

Statement you added in your comment I highly doubt can execute in 2-3 seconds. For me it executes for 0.3 seconds max. So when first 8 threads are created by thread pool, there is that 500ms delay before creating 9th. During that delay, some (or all) of first 8 threads are completed their job and are available for new work, so there is no need to create new thread and they can be reused.

To verify this, introduce bigger delay:

static void Main()
{
    // Thread-Local variable that yields a name for a thread
    ThreadLocal<string> ThreadName = new ThreadLocal<string>(() =>
    {
        return "Thread" + Thread.CurrentThread.ManagedThreadId;
    });

    // Action that prints out ThreadName for the current thread
    Action action = () =>
    {
        // If ThreadName.IsValueCreated is true, it means that we are not the
        // first action to run on this thread.
        bool repeat = ThreadName.IsValueCreated;            
        Console.WriteLine("ThreadName = {0} {1}", ThreadName.Value, repeat ? "(repeat)" : "");
        Thread.Sleep(1000000);
    };
    int th, io;
    ThreadPool.GetMinThreads(out th, out io);
    Console.WriteLine("cpu:" + Environment.ProcessorCount);
    Console.WriteLine(th);        
    Parallel.Invoke(Enumerable.Repeat(action, 100).ToArray());        

    // Dispose when you are done
    ThreadName.Dispose();
    Console.ReadKey();
}

You will see that now thread pool has to create new threads every time (much more than there are cores), because it cannot reuse previous threads while they are busy.

You can also increase number of min threads in thread pool, like this:

int th, io;
ThreadPool.GetMinThreads(out th, out io);
ThreadPool.SetMinThreads(100, io);

This will remove the delay (until 100 threads are created) and in above example you will notice that.

Zoran Horvat · Answer

Behind the scenes, threads are organized (and possessed by) the task scheduler. Primary purpose of the task scheduler is to keep all CPU cores used as much as possible with useful work.

Under the hood, scheduler is using the thread pool, and then size of the thread pool is the way to fine-tune usefulness of operations executed on CPU cores.

Now this requires some analysis. For instance, thread switching costs CPU cycles and it is not useful work. On the other hand, when one thread executes one task on a core, all other tasks are stalled and they are not progressing on that core. I believe that is the core reason why the scheduler is usually starting two threads per core, so that at least some movement is visible in case that one task takes longer to complete (like several seconds).

There are corollaries to this basic mechanism. When some tasks take long time to complete, scheduler starts new threads to compensate. That means that long-running task will now have to compete for the core with short-running tasks. In that way, short tasks will be completed one after another, and long task will slowly progress to its completion as well.

Bottom line is that your observations about threads are generally correct, but not entirely true in specific situations. In concrete execution of a number of tasks, scheduler might choose to raise more threads, or to keep going with the default. That is why you will sometimes notice that number of threads differs.

Remember the goal of the game: Utilize CPU cores with useful work as much as possible, while at the same time making all tasks move, so that the application doesn't look like frozen. Historically, people used to try to reach these goals with many different techniques. Analysis had shown that many of those techniques were applied randomly and didn't really increase CPU utilization. That analysis has lead to introduction of task schedulers in .NET, so that fine-tuning can be coded once and be done well.

understanding Parallel.Invoke, creation and reusing of threads

Tags:

c#

multithreading

Kjara

2 Answers

Evk

Zoran Horvat

Recent Activity

Donate For Us

understanding Parallel.Invoke, creation and reusing of threads

Tags:

c#

multithreading

Kjara

2 Answers

Evk

Zoran Horvat

Related questions

Recent Activity

Donate For Us