How do I scale to use more threads if and only if there is free cpu? Something like a ThreadPoolExecutor that uses more threads when cpu cores are idle, and less or just one if not.
Current situation: My Java server app processes requests and serves results. There is a ThreadPoolExecutor to serve the requests with a reasonable number of max threads following the principle: number of cpu cores = number of max threads. The work performed is cpu heavy, and there's some disk IO (DBs). The code is linear, single threaded. A single request takes between 50 and 500 ms to process. Sometimes there are just a few requests per minute, and other times there are 30 simultaneous. A modern server with 12 cores handles the load nicely. The throughput is good, the latency is ok.
Desired improvement: When there is a low number of requests, as is the case most of the time, many cpu cores are idle. Latency could be improved in this case by running some of the code for a single request multi-threaded. Some prototyping shows improvements, but as soon as I test with a higher number of concurrent requests, the server goes bananas. Throughput goes down, memory consumption goes overboard. 30 simultaneous requests sharing a queue of 10 meaning that 10 can run at most while 20 are waiting, and each of the 10 uses up to 8 threads at once for parallelism, seems to be too much for a machine with 12 cores (out of which 6 are virtual).
This seems to me like a common use case, yet I could not find information by searching.
1) request counting One idea is to count the current number of processed requests. If 1 or low then do more parallelism, if high then don't do any and continue single-threaded as before. This sounds simple to implement. Drawbacks are: request counter resetting must not contain bugs, think finally. And it does not actually check available cpu, maybe another process uses cpu also. In my case the machine is dedicated to just this application, but still.
2) actual cpu querying I'd think that the correct approach would be to just ask the cpu, and then decide. Since Java7 there is OperatingSystemMXBean.getSystemCpuLoad() see http://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad() but I can't find any webpage that mentions getSystemCpuLoad and ThreadPoolExecutor, or a similar combination of keywords, which tells me that's not a good path to go. The JavaDoc says "Returns the "recent cpu usage" for the whole system", and I'm wondering what "recent cpu usage" means, how recent that is, and how expensive that call is.
I had left this question open for a while to see if more input is coming. Nope. Although I don't like the "no-can-do" answer to technical questions, I'm going to accept Holger's answer now. He has good reputation, good arguments, and others have approved his answer. Myself I had experimented with idea 2 a bit. I queried the getSystemCpuLoad() in tasks to decide how large their own ExecutorService could be. As Holger wrote, when there is a SINGLE ExecutorService, resources can be managed well. But as soon as tasks start their own tasks, they cannot - it didn't work out for me.
There is no way of limiting based on “free CPU” and it wouldn’t work anyway. The information about “free CPU” is outdated as soon as you get it. Suppose you have twelve threads running concurrently and detecting at the same time that there is one free CPU core and decide to schedule a sub-task…
What you can do is limiting the maximum resource consumption which works quite well when using a single ExecutorService
with a maximum number of threads for all tasks.
The tricky part is the dependency of the tasks on the result of the sub-tasks which are enqueued at a later time and might still be pending due to the the limited number of worker threads.
This can be adjusted by revoking the parallel execution if the task detects that its sub-task is still pending. For this to work, create a FutureTask
for the sub-task manually and schedule it with execute
rather than submit
. Then proceed within the task as normally and at the place where you would perform the sub-task in a sequential implementation check whether you can remove
the FutureTask
from the ThreadPoolExecutor
. Unlike cancel
this works only if it has not started yet and hence is an indicator that there are no free threads. So if remove
returns true
you can perform the sub-task in-place letting all other threads perform tasks rather than sub-tasks. Otherwise, you can wait for the result.
At this place it’s worth noting that it is ok to have more threads than CPU cores if the tasks accommodate I/O operations (or may wait for sub-tasks). The important point here is to have a limit.
FutureTask<Integer> coWorker = new FutureTask<>(/* callable wrapping sub-task*/);
executor.execute(coWorker);
// proceed in the task’s sequence
if(executor.remove(coWorker)) coWorker.run();// do in-place if needed
subTaskResult=coWorker.get();
// proceed
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With