Background Information
I have a distributed processing application that does data analysis. It is designed to do parallel processing of many sets of data updated in real time. As part of the design, the analysis has been broken up into analytic nodes. Each node takes source data and processes it to create other data, which can then in turn be used by other nodes. To do our current full set of analysis on one data set requires about 200 nodes.
In the current design, each node runs with its own thread. Now, most of the time these threads are asleep. They wake up each in turn like a waterfall whenever data is updated, and then they go back to sleep. The application is currently in production running on 40 sets of data, each requiring 200 nodes, using 8000 threads. When there is no data coming in, there is no load on the server. When the data comes in at its busiest times, the server spikes to about 25% CPU. This is all within the design and production parameters of the project.
Now for the next step, we are scaling the 40 sets of data to 200. Each set requires 200 nodes which means a total of 40000 nodes, which is 40000 threads. This exceeds the max PID of our server, so I requested that our server admins increase the cap. They did it, and the application works, but they gave me some push-back about the number of threads. I'm not denying that the number of threads is unusual, but it is expected and warranted by this stage of our design.
I am planning some small tweaks to the design to separate the thread from the node. This would allow us to configure one thread to run multiple nodes, and reduce our thread count. For data sets that do not get updated frequently, there will be very little performance effect of having one thread execute the data updates in every node. For data sets that are updated hundreds of times per second, we can configure each node to run on its own thread. In fact, I don't doubt that this design change will be made -- it's only a matter of when. In the meantime, I'd like as much information as I can about the consequences of using this design.
Question
What are the costs of running with over 40,000 threads on one machine? How much performance am I losing by having the JVM / Linux OS manage this many threads? Please remember that they are all configured properly to sleep when there is no work. So, I'm just talking about extra overhead and problems caused by the sheer number of threads.
Please note - I know that I can reduce the number of threads, and I know that it's a good idea to make this design change. I'll do it as soon as I can, but it has to be balanced against other work and design considerations. I'm asking this question to gather information in order to make a good decision. Your thoughts and comments to this nature are much appreciated.
Each JVM server can have a maximum of 256 threads to run Java applications. In a CICS region you can have a maximum of 1024 threads. If you have many JVM servers running in the CICS region, you cannot set the maximum value for every JVM server.
Multithreading is a Java feature that allows concurrent execution of two or more parts of a program for maximum utilization of CPU. Each part of such program is called a thread. So, threads are light-weight processes within a process.
Threads allows a program to operate more efficiently by doing multiple things at the same time. Threads can be used to perform complicated tasks in the background without interrupting the main program.
What are the costs of running with over 40,000 threads on one machine? How much performance am I losing by having the JVM / Linux OS manage this many threads? Please remember that they are all configured properly to sleep when there is no work. So, I'm just talking about extra overhead and problems caused by the sheer number of threads.
In the JVM space, each thread needs a thread stack (default 256kb) and the Thread object and connected objects. The default thread stack can be changed using the -Xss option, but I believe that 64kb is the lower limit. (40,000 x 256kb is 10Gb ...)
On Linux, each thread also occupies an OS thread descriptor which will help the thread's register context when the thread is not executing ... and other stuff. These descriptors are preallocated, and I believe they are not paged. This is the resource that your admins needed to increase.
These resources are used whether the thread is awake or sleeping.
Another issue is that you need to be a bit careful about synchronizing using wait / notifyAll. If there are lots of threads waiting on the same object, then a notifyAll will cause a flurry of activity as each thread gets woken up. (But you can avoid this by not having lots threads waiting on the same object.)
See the Oracle Java Threading page for more info on the consequences of using huge numbers of threads.
My feeling is that 40,000 threads is excessive. The ideal number of threads is proportional to the number of physical processors / cores you have. While you won't necessarily see a decrease in performance by having huge numbers of threads, you will be tying down lots of resources, and that could have indirect performance issues; e.g. longer GC times, potential VM thrashing.
A better architecture for your application would be to implement a thread pool and work queues to farm the work out to a much smaller number of active threads.
Now you said that threads will sleep when there is no work. How often will there be work? How many units of work are being done concurrently? If that number is greater then the number of processors, and the work as stated is mostly CPU based, you will actually see overall performance degradation.
But lets assume the amount of work done at any given time is the number of processors. If that's the case, the number one issue I can see is the amount of context switching that will occur. A context switch in Java (generally based) is around 100 instrucitons. If all your threads in a short period of time get switched in (awaken) to do some of their work, then we are talking > 4,000,000 extra instructions.
A bit more information on the cost of context switch, as they will probably effect your program more then anything. An excerpt from this document explains the cost of validating the thread's local cache when switching in
When a new thread is switched in, the data it needs is unlikely to be in the local processor cache, so a context switch causes a flurry of cache misses, and thus threads run a little more slowly when they are first scheduled. This is one of the reasons that schedulers give each runnable thread a certain minimum time quantum even when many other threads are waiting
Aside from that you have the added stack space needed to be allocated as well has heap for the 40,000 thread objects (which is only around 7 megs of shallow heap for the threads).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With