Speed variation between vCPUs on the same Amazon EC2 instance

Question

I'm exploring the feasibility of running numerical computations on Amazon EC2. I currently have one c4.8xlarge instance running. It has 36 vCPUs, each of which is a hyperthread of a Haswell Xeon chip. The instance runs Ubuntu in HVM mode.

I have a GCC-optimised binary of a completely sequential (i.e. single-threaded) program. I launched 30 instances with CPU-pinning thus:

for i in `seq 0 29`; do
    nohup taskset -c $i $BINARY_PATH &> $i.out &
done

The 30 processes run almost identical calculations. There's very little disk activity (a few megabytes every 5 minutes), and there's no network activity or interprocess communication whatsoever. htop reports that all processes run constantly at 100%.

The whole thing has been running for about 4 hours at this point. Six processes (12-17) have already done their task, while processes 0, 20, 24 and 29 look as if they will require another 4 hours to complete. Other processes fall somewhere in between.

My questions are:

Other than resource contention with other users, is there anything else that may be causing the significant variation in performance between the vCPUs within the same instance? As it stands, the instance would be rather unsuitable for any OpenMP or MPI jobs that synchronise between threads/ranks.
Is there anything I can do to achieve a more uniform (hopefully higher) performance across the cores? I have basically excluded hyperthreading as a culprit here since the six "fast" processes are hyperthreads on the same physical cores. Perhaps there's some NUMA-related issue?

Olivier Delrieu · Accepted Answer

My experience is on c3 instances. It's likely similar with c4.

For example, take a c3.2xlarge instance with 8 vCPUs (most of the explaination below is derived from direct discussion with AWS support).

In fact only the first 4 vCPUs are usable for heavy scientic calculations. The last 4 vCPUs are hyperthreads. For scientific applications it’s often not useful to use hyperthreading, it causes context swapping or reduces the available cache (and associated bandwidth) per thread.

To find out the exact mapping between the vCPUs and the physical cores, look into /proc/cpuinfo

"physical id" : shows the physical processor id (only one processor in c3.2xlarge)
"processor" : gives the number of vCPUs
"core id" : tells you which vCPUs map back to each Core ID.

If you put this in a table, you have:

 physical_id   processor    core_id
 0             0            0
 0             1            1
 0             2            2
 0             3            3
 0             4            0
 0             5            1
 0             6            2
 0             7            3

You can also get this from the "thread_siblings_list". Internal kernel map of cpuX's hardware threads within the same core as cpuX (https://www.kernel.org/doc/Documentation/cputopology.txt) :

cat /sys/devices/system/cpu/cpuX/topology/thread_siblings_list

When Hyper-threading is enabled each vCPU (Processor) is a "Logical Core". There will be 2 "Logical Cores" that are associated with a "Physical Core"

So, in your case, one solution is to disable hyperthreading with :

echo 0 > /sys/devices/system/cpu/cpuX/online

Where X for a c3.2xlarge would be 4...7

EDIT : you can observe this behaviour only in HVM instances. In PV instances, this topology is hidden by the hypervisor : all core ids & processor ids in /proc/cpuinfo are '0'.

Speed variation between vCPUs on the same Amazon EC2 instance

Tags:

amazon-web-services

amazon-ec2

Saran Tunyasuvunakool

1 Answers

Olivier Delrieu

Recent Activity

Donate For Us

Speed variation between vCPUs on the same Amazon EC2 instance

Tags:

amazon-web-services

amazon-ec2

Saran Tunyasuvunakool

1 Answers

Olivier Delrieu

Related questions

Recent Activity

Donate For Us