Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Turning off Hyper-Threading in 6-core Intel Xeon

We got a 12-core MacPro to do some Monte Carlo calculations. Its Intel Xeon processors have Hyper-Threading (HT) enabled, so in fact there should be 24 processes running in parallel to make them fully utilized. However, our calcs are more efficient to run on 12x100% than 24x50%, so we tried to turn Hyper-Threading off via Processor pane in system preferences in order to get higher performance. One can also turn HT off by

hwprefs -v cpu_ht=false

Then we ran some tests and here is what we got:

  1. 12 parallel tasks run the same time w/ or w/o HT to our disappointment.
  2. 24 parallel tasks loose 20% if HT is off (not -50% as we thought)
  3. When HT is on, switching from 24 to 12 tasks decreases efficiency by 20% (also surprising)
  4. When HT is off, switching from 24 to 12 doesn't change anything.

It seems that Hyper-Threading just decreases performance for our calculations and there is no way to avoid it. The program we use for the calcs is written in Fortran and compiled with gfortran. Is there a way to make it more efficient with this piece of hardware?


Update: Our Monte Carlo calculations (MCC) are typically done in steps to avoid data loss and due to other reasons (it's not always possible to avoid such steps). In our case each step consists of many simulations with variable duration. Since each step is splited between a number of parallel tasks, they also have variable duration. Essentially, all faster tasks have to wait until the slowest is done. This fact forces us to make bigger steps, which finish with less deviation in time due to averaging, so processors do not waste their time on waiting. This is our motivation for having 12*2.66 GHz instead of 24*1.33 GHz. If it would be possible to turn HT off, then we would get about +10% performance by switching from 24 tasks w/ HT to 12 tasks w/o HT. However, the tests show that we loose 20%. So my conclusion is that the calculation is 30% as inefficient.

For the tests I used quite large steps, however usually steps are shorter, so efficiency becomes even further.

There is one more reason - some of our calculations require 3-5 GB of memory, so you probably see how economical it would be for us to have 12 fast tasks. We are working on implementing shared memory, but it's going to be a looong term project. Therefore we need to find out how to make the existing hardware/software as fast as possible.

like image 618
Andrei Fokau Avatar asked Oct 04 '10 11:10

Andrei Fokau


1 Answers

This is more of an extended comment than an answer:

I don't find your observations terrifically surprising. Hyper-threading is a poor-man's approach to parallelisation, it allows you to have 2 pipelines of pending instructions on one CPU. But it doesn't provide extra floating-point or integer arithmetic units or more registers; when one pipeline is unable to feed the ALU (or whatever it's called these days) the other pipeline is activated within a clock cycle or two. This contrasts with the situation on a CPU without hyperthreading where, when the instruction pipeline stalls, it has to be flushed and refilled with instructions from another process before the CPU gets back up to speed.

The Wikipedia article on hyperthreading explains all this rather well.

If you are running loads in which pipeline stalls are perfectly synchronised and represent a major part of the total execution time of your program mix, then you might double the speed of a program by going from an unhyperthreaded processor to a hyperthreaded processor.

IF (that's a big if) you could write a program which never stalled in the instruction pipeline then hyperthreading would provide no benefit (in terms of execution acceleration) whatsoever. What you have measured is not a speedup due to HT (well, it is a speedup due to HT but you don't actually want that) but the failure of your threads to keep the pipeline moving.

What you have to do is actually decrease the speedup due to HT ! Or, rather, you have to increase the execution rate of the 12 processes (one per core) by keeping the pipeline filled. Personally, I'd switch off hyperthreading while I optimised the program's execution on 12 cores.

Have fun.

like image 183
High Performance Mark Avatar answered Sep 23 '22 13:09

High Performance Mark