I am writing a CPU-intensive image processing library. To make best use of available CPU, I can detect the total number of cores on my machine and have my library run with that number of threads. When my library to allocate one thread for each core it performs optimally using 100% available processor time.
The above approach works fine when mine is the only CPU-heavy process running. If another CPU-intensive process is running, or even another instance of my own code, then the OS allocates us only a fraction of the available cores and my library then has too many threads running which is both inefficient and inconsiderate to other processes.
So I would like to find a way to determine the "fair share" number of threads to run given a specific load. For example, if two instances of my process are running on an 8-core machine, each would run with 4 threads. Each would need a way to adapt thread count dynamically according to fluctuations in machine load.
So, my question:
My focus is Windows but interested in non-Windows solutions too.
Edit: to be clear, this is about optimization. I am trying to achieve peak efficiency by running the optimal number of threads appropriate to my fair share of the CPU.
In my eyes, the application shouldnt decide how many threads to spawn. This is an information, that the caller should know. In linux, the "-j" or "--jobs" parameter is widely used (Default: 1).
What about also setting the priority of the processing tasks. So if the caller knows, the processing is mission-critical, he can increase the prio (with the knowledge of maybe blocking the (whole) system). Your processing lib would never know, how important the processing of this image would be. If the caller doesnt care, then the default low-prio is used, which shouldnt affect the rest of the system. If it does, you should look to what is exactly blocking the system (maybe writing image files to the hdd, reduce ram size to prevent swapping, ...). If you figured out that, you can optimize exactly that point.
If you start the processing with (cpu-cores)*2 on low till normal priority, your system should be useable. No one would expect, that this will kill the system.
Just my 2 cents.
Actually it's not a problem of multithreading but a problem of executing many programs simultaneously. This is hard on most PC's operating systems because it conflicts to the idea of time-sharing.
Let's assume some workflow.
Suppose we have 8 cores and we create 8 threads to feed them; ok, that's easy. Next we choose to monitor core loading to summary how many tasks running on a certain core; well, that needs some statistical assumptions, e.g on Linux you can get a 1/5/15-mins load average chart, but that could be done. The statistical chart is clear and now we get a plot about how many CPU-bound processes are running, say, seeing other 3 CPU-intensive processes.
Then we come to the point: we have to make 3 redundant threads to sleep, but which 3?
Usually we choose 3 threads arbitrarily because the scheduler arranges the other 8 CPU-bound threads automatically. In some cases, we explicitly put threads on high load cores to sleep, assign other threads to certain low load cores, and let the scheduler do the rest things. Most scheduling policies also try to "keep CPU cache hot", which means they tend to forbid transferring threads between cores. We reasonably expect our CPU-intensive threads can utilize the core cache since other processes are scheduled to the 3 crowded cores. Everything looks good.
However this could fail in tightly synchronized computation. In this scenario we need to run our 5 threads simultaneously. Simultaneity here means the 5 threads have to gain CPU and run at almost the same time. I don't know if there's any scheduler on PC could do this for us. In most low-load cases, things still work fine because costs to wait for simultaneity is trivial. But when the load of a core is high and even 1 of our 5 threads is disturbed, occasionally we'll find we spend many life cycles in waiting.
It may help to schedule your program as a real-time program but it's not a perfect solution. Statistically it leads to a wider time window for simultaneity when it gains more CPU control priority. I have to say, it's not guaranteed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With