Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GNU make: should the number of jobs equal the number of CPU cores in a system?

There seems to be some controversy on whether the number of jobs in GNU make is supposed to be equal to the number of cores, or if you can optimize the build time by adding one extra job that can be queued up while the others "work".

Is it better to use -j4 or -j5 on a quad core system?

Have you seen (or done) any benchmarking that supports one or the other?

like image 284
Johan Avatar asked Mar 23 '10 10:03

Johan


People also ask

Is it always better to create as many threads per process as the number of CPUs available?

Every process has at least one thread, but there is no maximum number of threads a process can use. For specialized tasks, the more threads you have, the better your computer's performance will be. With multiple threads, a single process can handle a variety of tasks simultaneously.

How many worker threads should a CPU have?

A single CPU core can have up-to 2 threads per core. For example, if a CPU is dual core (i.e., 2 cores) it will have 4 threads. And if a CPU is Octal core (i.e., 8 core) it will have 16 threads and vice-versa.

How do you calculate number of cores?

Press Ctrl + Shift + Esc to open Task Manager. Select the Performance tab to see how many cores and logical processors your PC has.


2 Answers

I would say the best thing to do is benchmark it yourself on your particular environment and workload. Seems like there are too many variables (size/number of source files, available memory, disk caching, whether your source directory & system headers are located on different disks, etc.) for a one-size-fits-all answer.

My personal experience (on a 2-core MacBook Pro) is that -j2 is significantly faster than -j1, but beyond that (-j3, -j4 etc.) there's no measurable speedup. So for my environment "jobs == number of cores" seems to be a good answer. (YMMV)

like image 171
David Gelhar Avatar answered Sep 29 '22 12:09

David Gelhar


I've run my home project on my 4-core with hyperthreading laptop and recorded the results. This is a fairly compiler-heavy project but it includes a unit test of 17.7 seconds at the end. The compiles are not very IO intensive; there is very much memory available and if not the rest is on a fast SSD.

1 job        real   2m27.929s    user   2m11.352s    sys    0m11.964s     2 jobs       real   1m22.901s    user   2m13.800s    sys    0m9.532s 3 jobs       real   1m6.434s     user   2m29.024s    sys    0m10.532s 4 jobs       real   0m59.847s    user   2m50.336s    sys    0m12.656s 5 jobs       real   0m58.657s    user   3m24.384s    sys    0m14.112s 6 jobs       real   0m57.100s    user   3m51.776s    sys    0m16.128s 7 jobs       real   0m56.304s    user   4m15.500s    sys    0m16.992s 8 jobs       real   0m53.513s    user   4m38.456s    sys    0m17.724s 9 jobs       real   0m53.371s    user   4m37.344s    sys    0m17.676s 10 jobs      real   0m53.350s    user   4m37.384s    sys    0m17.752s 11 jobs      real   0m53.834s    user   4m43.644s    sys    0m18.568s 12 jobs      real   0m52.187s    user   4m32.400s    sys    0m17.476s 13 jobs      real   0m53.834s    user   4m40.900s    sys    0m17.660s 14 jobs      real   0m53.901s    user   4m37.076s    sys    0m17.408s 15 jobs      real   0m55.975s    user   4m43.588s    sys    0m18.504s 16 jobs      real   0m53.764s    user   4m40.856s    sys    0m18.244s inf jobs     real   0m51.812s    user   4m21.200s    sys    0m16.812s 

Basic results:

  • Scaling to the core count increases the performance nearly linearly. The real time went down from 2.5 minutes to 1.0 minute (2.5x as fast), but the time taken during compile went up from 2.11 to 2.50 minutes. The system noticed barely any additional load in this bit.
  • Scaling from the core count to the thread count increased the user load immensely, from 2.50 minutes to 4.38 minutes. This near doubling is most likely because the other compiler instances wanted to use the same CPU resources at the same time. The system is getting a bit more loaded with requests and task switching, causing it to go to 17.7 seconds of time used. The advantage is about 6.5 seconds on a compile time of 53.5 seconds, making for a 12% speedup.
  • Scaling from thread count to double thread count gave no significant speedup. The times at 12 and 15 are most likely statistical anomalies that you can disregard. The total time taken increases ever so slightly, as does the system time. Both are most likely due to increased task switching. There is no benefit to this.

My guess right now: If you do something else on your computer, use the core count. If you do not, use the thread count. Exceeding it shows no benefit. At some point they will become memory limited and collapse due to that, making the compiling much slower. The "inf" line was added at a much later date, giving me the suspicion that there was some thermal throttling for the 8+ jobs. This does show that for this project size there's no memory or throughput limit in effect. It's a small project though, given 8GB of memory to compile in.

like image 22
dascandy Avatar answered Sep 29 '22 12:09

dascandy