I want to run a batch say 20 CPU intensive comps (basically really long nested for loop) on a machine.
Each of these 20 jobs doesn't share data with the other 19.
If the machine has N cores, should I spin off N-1 of these jobs then? Or N? Or should I just launch all 20, and have Windows figure out how to schedule them?
Unfortunately, there is no simple answer. The only way to know for sure is to implement and then profile your application.
Typically, for maximum throughput, if the jobs are pure CPU, you'd want one per core. Depending on the type of work, this would include one per hyperthread code or just one per "true physical core". (If the work is identical for all 20 jobs, then hyperthreading often slows down the overall work...)
If the jobs have any non-CPU functionaltiy (such as reading a file, waiting on anything, etc), then >1 work item per core tends to be much better. For many situations, this will improve.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With