Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How many threads to use?

I know there are some existing questions and they provide a very good general perspective on things. I'm hoping to get some details on the C#/VB.Net side for the actual implementation (not philosophy) of some of these perspectives.

My Particular Case

I have a WCF Service which, amongst other things, receives files. For most of the service's life this particular area is actually just sat doing nothing - when work does come it arrives in high bursts of greatly varying quantities.

For each file received (which at a max can be thousands per second) the service needs to work on the files for between 1-10 seconds (each) depending on a number of other services, local resources, and network IO wait times.

To aid the service with these burst workloads I implemented a Queue system. Those thousands of files recieved per second are placed onto the Queue. A controller calculates the number of threads to use based on the size of the queue, up until it reaches a "Peak Max Threads" setting which prevents it from creating additional threads. These threads are placed in a thread pool, and reused to cycle through the queue. The controller will; at intervals; recalculate the number of threads required. If the queue size reduces, a relevant number of threads are released.

The age old problem

How many threads should I peak at? Clearly, adding a new thread everytime a file was received would be silly for lack of a better word - the performance, at best, would deteriorate. Capping the threads when CPU utilization is only 10% across each core, also doesn't seem to be the best use of resources.

So, is there an appropriate way to determine how many threads to cap at? I would rather the service could determine this for itself by sampling available resources, but is there a performance hit from doing so? I know the common answer is to monitor workloads, adjust the counts through trial and error until I find a number I like, but due to the nature of this service (long periods of idle followed by high/burst workloads) it could take a long time to get that kind of information.

What then if we move the server's image to a different host which is faster/slower/different to the first? I have to re-sample the process all over again?

Ideally what I'm after, is for the co-ordinator to intelligently increase the size of the threadpool until CPU utilisation is at x% (would 80% be reasonable? 90%? 99%?). Clearly, I want to do this without adding more threads than is necessary to hit x% otherwise all I'll end up with is threads not just waiting on IO resources, but awaiting each other too.

Thanks in advance!


Related questions (if you want some generic ideas):

How many threads to create?

How many threads is too many?

How many threads to create and when?


A Complication for you

Where would be the fun if I didn't make the problem more difficult?

As it currently stands, the service does hit 100% cpu during these bursts, regularly. The issue is the CPU utilisation spikes. It goes from idle (0-10%) to 100%, and back down again. I'm not sure I can help that - ideally I wouldn't take it all the way to 100%. The problem exists because the files mentioned are in fact images, and part of the services' process is to pass the image through to the System.Windows.Media blackbox which does some complex image processing for me.

There are then lulls in between the spikes because of the IO waits and other processing that goes on. If the spikes hitting 100% can't be helped (and I'm all for knowing how to prevent that, or if I should) how should I aim for the CPU utilisation graph to look? Sat constantly at 100%? Bouncing between 50-100? If I do go through the effort of sampling to decide what does seem to work best, is it guaranteed that switching the virtual servers' host will also work best with the same graph?

This added complexity I won't take into consideration for those of you willing to answer. Feel free to ignore this section. However, any answer that also accounts for this complication, or even answers that just provide tips on how to handle it, I'll at the very least upvote!

Heck of a long question - sorry about that - and thanks for reading so much!!

like image 615
Smudge202 Avatar asked Jul 07 '11 09:07

Smudge202


People also ask

How do I know how many threads to use?

The optimal number of threads is likely to be either the number of cores in your machine or the number of cores times two. In more abstract terms, you want the highest possible throughput.

How many threads should my CPU be using?

Each CPU core can have two threads. So a processor with two cores will have four threads. A processor with eight cores will have 16 threads.

How many threads should I use for multithreading?

Each CPU core can have up to two threads if your CPU has multi/hyper-threading enabled. You can search for your own CPU processor to find out more. For Mac users, you can find out from About > System Report. This means that my 6-Core i7 processor has 6 cores and can have up to 12 threads.

How many thread should I create?

On Windows machines, there's no limit specified for threads. Thus, we can create as many threads as we want, until our system runs out of available system memory.


1 Answers

PerformanceCounter allows you to query for processor usage.

However ,have you tried something the framework provides?

        foreach (var file in files)
        {
            var workitem = file;
            Task.Factory.StartNew(() =>
            {
                // do work on workitem
            }, TaskCreationOptions.LongRunning | TaskCreationOptions.PreferFairness);
        }

You can tune the concurrency level for Tasks in the Task.Factory.

The .NET 4 threadpool by default will schedule the number of threads it finds most performing on the hardware where it runs, but you can change how that works with the previous link.

Probably you need a custom solution but it would be ok to benchmark yours with the standard.

Edit: (comment note):

No links needed, I may have used an invented term since english is not my language. What I mean is: have a variable where you store the variance before the last check (prevDelta), and call it delta. add this to the varuiable avrageDelta and divide by 2, each time you 'check'. You will have the variable averageDelta that will mostly be low since you have no activity. Then have another set of delta variables, one you have already (delta - prevdelta), and store it in a delta variable that is not the average of all deltas but the average of deltas in a small timespan (you will have to come up with an algortihm to calculate accurately this temporal variance). Once done this you can compare the average delta and the 'temporal delta'. The average delta will be mostly low and will slowly go up whjen bursts come. In the same period the temporal delta will go up really fast. Then you have the situation when the burst stops, the average delta goes slowly down, and the 'temporal' goes really fast.

like image 159
Marino Šimić Avatar answered Sep 20 '22 02:09

Marino Šimić