Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linux 2.6.31 Scheduler and Multithreaded Jobs

I run massively parallel scientific computing jobs on a shared Linux computer with 24 cores. Most of the time my jobs are capable of scaling to 24 cores when nothing else is running on this computer. However, it seems like when even one single-threaded job that isn't mine is running, my 24-thread jobs (which I set for high nice values) only manage to get ~1800% CPU (using Linux notation). Meanwhile, about 500% of the CPU cycles (again, using Linux notation) are idle. Can anyone explain this behavior and what I can do about it to get all of the 23 cores that aren't being used by someone else?

Notes:

  1. In case it's relevant, I have observed this on slightly different kernel versions, though I can't remember which off the top of my head.

  2. The CPU architecture is x64. Is it at all possible that the fact that my 24-core jobs are 32-bit and the other jobs I'm competing w/ are 64-bit is relevant?

Edit: One thing I just noticed is that going up to 30 threads seems to alleviate the problem to some degree. It gets me up to ~2100% CPU.

like image 827
dsimcha Avatar asked May 13 '10 16:05

dsimcha


1 Answers

It is possible that this is caused by the scheduler trying to keep each of your tasks running on the same CPU that it was previously running on (it does this because the task has likely brought its working set into that CPU's cache - it's "cache hot").

Here's a few ideas you can try:

  • Run twice as many threads as you have cores;
  • Run one or two less threads than you have cores;
  • Reduce the value of /proc/sys/kernel/sched_migration_cost (perhaps down to zero);
  • Reduce the value of /proc/sys/kernel/sched_domain/.../imbalance_pct down closer to 100.
like image 69
caf Avatar answered Oct 05 '22 10:10

caf