How many threads is too many?

Tags:

I am writing a server, and I send each action of into a separate thread when the request is received. I do this because almost every request makes a database query. I am using a threadpool library to cut down on construction/destruction of threads.

My question is: what is a good cutoff point for I/O threads like these? I know it would just be a rough estimate, but are we talking hundreds? Thousands?

How would I go about figuring out what this cutoff would be?

EDIT:

Thank you all for your responses, it seems like I am just going to have to test it to find out my thread count ceiling. The question is though: how do I know I've hit that ceiling? What exactly should I measure?

690

asked Jan 27 '09 00:01

ryeguy

2 Answers

Some people would say that two threads is too many - I'm not quite in that camp :-)

Here's my advice: measure, don't guess. One suggestion is to make it configurable and initially set it to 100, then release your software to the wild and monitor what happens.

If your thread usage peaks at 3, then 100 is too much. If it remains at 100 for most of the day, bump it up to 200 and see what happens.

You could actually have your code itself monitor usage and adjust the configuration for the next time it starts but that's probably overkill.

For clarification and elaboration:

I'm not advocating rolling your own thread pooling subsystem, by all means use the one you have. But, since you were asking about a good cut-off point for threads, I assume your thread pool implementation has the ability to limit the maximum number of threads created (which is a good thing).

I've written thread and database connection pooling code and they have the following features (which I believe are essential for performance):

a minimum number of active threads.
a maximum number of threads.
shutting down threads that haven't been used for a while.

The first sets a baseline for minimum performance in terms of the thread pool client (this number of threads is always available for use). The second sets a restriction on resource usage by active threads. The third returns you to the baseline in quiet times so as to minimise resource use.

You need to balance the resource usage of having unused threads (A) against the resource usage of not having enough threads to do the work (B).

(A) is generally memory usage (stacks and so on) since a thread doing no work will not be using much of the CPU. (B) will generally be a delay in the processing of requests as they arrive as you need to wait for a thread to become available.

That's why you measure. As you state, the vast majority of your threads will be waiting for a response from the database so they won't be running. There are two factors that affect how many threads you should allow for.

The first is the number of DB connections available. This may be a hard limit unless you can increase it at the DBMS - I'm going to assume your DBMS can take an unlimited number of connections in this case (although you should ideally be measuring that as well).

Then, the number of threads you should have depend on your historical use. The minimum you should have running is the minimum number that you've ever had running + A%, with an absolute minimum of (for example, and make it configurable just like A) 5.

The maximum number of threads should be your historical maximum + B%.

You should also be monitoring for behaviour changes. If, for some reason, your usage goes to 100% of available for a significant time (so that it would affect the performance of clients), you should bump up the maximum allowed until it's once again B% higher.

In response to the "what exactly should I measure?" question:

What you should measure specifically is the maximum amount of threads in concurrent use (e.g., waiting on a return from the DB call) under load. Then add a safety factor of 10% for example (emphasised, since other posters seem to take my examples as fixed recommendations).

In addition, this should be done in the production environment for tuning. It's okay to get an estimate beforehand but you never know what production will throw your way (which is why all these things should be configurable at runtime). This is to catch a situation such as unexpected doubling of the client calls coming in.

153

answered Sep 30 '22 03:09

paxdiablo

This question has been discussed quite thoroughly and I didn't get a chance to read all the responses. But here's few things to take into consideration while looking at the upper limit on number of simultaneous threads that can co-exist peacefully in a given system.

Thread Stack Size : In Linux the default thread stack size is 8MB (you can use ulimit -a to find it out).
Max Virtual memory that a given OS variant supports. Linux Kernel 2.4 supports a memory address space of 2 GB. with Kernel 2.6 , I a bit bigger (3GB )
[1] shows the calculations for the max number of threads per given Max VM Supported. For 2.4 it turns out to be about 255 threads. for 2.6 the number is a bit larger.
What kindda kernel scheduler you have . Comparing Linux 2.4 kernel scheduler with 2.6 , the later gives you a O(1) scheduling with no dependence upon number of tasks existing in a system while first one is more of a O(n). So also the SMP Capabilities of the kernel schedule also play a good role in max number of sustainable threads in a system.

Now you can tune your stack size to incorporate more threads but then you have to take into account the overheads of thread management(creation/destruction and scheduling). You can enforce CPU Affinity to a given process as well as to a given thread to tie them down to specific CPUs to avoid thread migration overheads between the CPUs and avoid cold cash issues.

Note that one can create thousands of threads at his/her wish , but when Linux runs out of VM it just randomly starts killing processes (thus threads). This is to keep the utility profile from being maxed out. (The utility function tells about system wide utility for a given amount of resources. With a constant resources in this case CPU Cycles and Memory, the utility curve flattens out with more and more number of tasks ).

I am sure windows kernel scheduler also does something of this sort to deal with over utilization of the resources

[1] http://adywicaksono.wordpress.com/2007/07/10/i-can-not-create-more-than-255-threads-on-linux-what-is-the-solutions/

answered Sep 30 '22 03:09

Jay D

Related questions
                            
                                Does ruby have real multithreading?
                            
                                How to make a function wait until a callback has been called using node.js
                            
                                ThreadStart with parameters
                            
                                How to pass parameters to ThreadStart method in Thread?
                            
                                What resources are shared between threads?
                            
                                What is a "thread" (really)?
                            
                                Why doesn't JavaScript support multithreading?
                            
                                If async-await doesn't create any additional threads, then how does it make applications responsive?
                            
                                How to pause / sleep thread or process in Android?
                            
                                How can I pass a parameter to a Java Thread?
                            
                                What is the difference between atomic / volatile / synchronized?
                            
                                C# version of java's synchronized keyword?
                            
                                multiprocessing.Pool: When to use apply, apply_async or map?
                            
                                Getting the thread ID from a thread
                            
                                Why use a ReentrantLock if one can use synchronized(this)?
                            
                                When should the volatile keyword be used in C#?
                            
                                Start thread with member function
                            
                                Using module 'subprocess' with timeout
                            
                                What's the difference between deadlock and livelock?
                            
                                What is the difference between asynchronous programming and multithreading?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How many threads is too many?

Tags:

performance

multithreading

threadpool