Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

_mm_load_ps vs. _mm_load_pd vs. etc on Intel x86 ISA

Tags:

c

x86

intel

simd

sse

What's the difference between the following two lines?

__m128 x = _mm_load_ps((float *) ptr);
__m128 y = _mm_load_pd((double *)ptr);

In other words, why are there so many different _mm_load_xyz instructions, instead of a generic __m128 _mm_load(const void *)?

like image 532
user541686 Avatar asked Jan 13 '12 19:01

user541686


1 Answers

There are different intrinsics because they correspond to different instructions.

There are different load instructions because Intel wants to maintain the freedom to design a processor on which double-precision vectors are backed by a different physical register file than are single-precision vectors or integer vectors, or use different execution units. Any of these might add additional latency if there were not a way to specify that data should be loaded into the appropriate register file or forwarding network.

One way to think about it is that the different instructions do the "same thing", but additionally provide a hint to the processor telling it how the data that is being loaded will be used by future instructions. This may help the processor make sure that the data is in the right place to be used as efficiently as possible, or it may be ignored by the processor.

Note that this isn't just a hypothetical. There exist processors on which using an integer vector load (MOVDQA) to load data that is consumed by a floating-point operation requires more time than using a floating-point load to get data for a floating-point operation (and vice-versa). See the Intel Optimization Manual, or Agner Fog's notes for more detail on the subject. Use the load that matches how you will use the data to avoid the risk of such performance hazards in the future.

like image 140
Stephen Canon Avatar answered Sep 28 '22 06:09

Stephen Canon