Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multithreading or task parallel library

I have an application which performs 30 independent tasks simultaneously using multithreading, each task retrieves data over http, performs a calculation and returns a result to the ui thread.

Can I use TPL to perform the same tasks?

Does TPL create 30 new threads and spread them over all the available cores, or does it just split the tasks over the available cores and use one thread per core?

Will there be a performance boost using TPL over multithreading in this case?

like image 937
Bruce Adams Avatar asked Mar 26 '10 08:03

Bruce Adams


3 Answers

As a general rule, there is nothing that stops the TPL to use more (or less) threads than cores.

To control the situation somewhat using TPL, my first approach would be: make sure that the threadpool max threads setting is at least 30, then parallelize the task with a maximum concurrency level of 30. Within the task, you can use a semaphore before you start the CPU-bound computation to limit concurrency to the number of cores. If you are not running under IIS or SQL server, you are able and may wish to set the minimum/maximum number of threadpool threads to 30 in order to prevent the thread pool heuristics playing with the number of threads too much. (Provided, of course, that TPL and the thread pool is not used for other purposes during this time in your application.)

The optimal number of threads depends on the situation. Consider e.g. your scenario: your tasks are not CPU bound when they retrieve data - they are network bound. As you start the tasks, it would be wise to increase parallelism so that downloads are carried out simultaneously. Your calculations may be CPU bound, however. In that case, decreasing the number of threads so that only one thread runs per core might yield better performance.

TPL is now based on the new CLR Thread Pool.
The thread pool uses heuristics to decide about the number of threads.
There is a Channel9 video about the new thread pool with some insight.
The heuristics of the old thread pool and some bits about the new can be found here (last paragraph "What the Future Holds?").

The algorithm and the numbers were subject to changes throughout the different versions of the CLR.
It might be the case in the future as well.

There are many posts about the concurrency level, one I came across is here.

like image 140
Andras Vass Avatar answered Sep 17 '22 21:09

Andras Vass


I believe TPL will usually use one thread per core unless you specifically tell it to use more. It's possible that it will detect when that's not enough - e.g. in your case, where your tasks are going to spend most of their time waiting for data.

Is there any reason you can't use asynchronous web fetching? I suspect there's no need to have a thread per task or even a thread per core here. TPL makes various aspects of asynchronous programming easier, with things like continuations.

In terms of efficiency, is your application actually CPU bound? It sounds like you need to be getting the maximum appropriate level of parallelism at the network side - that's the bit to concentrate on, unless the calculations are really heavyweight.


UPDATES - NOT FROM ORIGINAL AUTHOR

The answer above is great as always but could be misleading as it does not have some important changes in .NET 4.0 CLR.

As Andras says, current TPL implementation uses the thread pool hence will use as many threads as required (number of cores is irrelevant now):

The Task Parallel Library (TPL) is a collection of new classes specifically designed to make it easier and more efficient to execute very fine-grained parallel workloads on modern hardware. TPL has been available separately as a CTP for some time now, and was included in the Visual Studio 2010 CTP, but in those releases it was built on its own dedicated work scheduler. For Beta 1 of CLR 4.0, the default scheduler for TPL will be the CLR thread pool, which allows TPL-style workloads to “play nice” with existing, QUWI-based code, and allows us to reuse much of the underlying technology in the thread pool - in particular, the thread-injection algorithm, which we will discuss in a future post.

From:

http://blogs.msdn.com/b/ericeil/archive/2009/04/23/clr-4-0-threadpool-improvements-part-1.aspx

like image 21
Jon Skeet Avatar answered Sep 19 '22 21:09

Jon Skeet


I have an application which performs 30 independent tasks simultaneously using multithreading, each task retrieves data over http, performs a calculation and returns a result to the ui thread.

That is an IO-bound concurrent program.

Can I use TPL to perform the same tasks?

You can but the TPL is designed for CPU-bound parallel programs so you would be abusing it.

Does TPL create 30 new threads and spread them over all the available cores, or does it just split the tasks over the available cores and use one thread per core?

Neither. The TPL essentially uses per-core wait-free work-stealing task queues to dynamically load balance CPU-intensive computations as they run.

Will there be a performance boost using TPL over multithreading in this case?

You will save 30 thread creations and the extra contention your unnecessary threads incur.

The correct solution to your problem is to write an asynchronous program that does not block threads. This is done by expressing the remainder of your computation after your downloads are complete as a continuation that is invoked with the data when the download has completed.

Microsoft's new F# programming language includes features specifically designed to make this easy. For example, your problem can be solved with only 5 lines of code in F#:

let fetchCalcAndPost uris calc post =
  for uri in uris do
    async { use client = new System.Net.WebClient()
            let! data = client.AsyncDownloadString uri
            do calc data |> post }
    |> Async.Start

This solution never blocks any thread so it is fully concurrent.

like image 40
J D Avatar answered Sep 18 '22 21:09

J D