What's the difference between _mm_broadcast_ss()
and _mm_load_ps1()
?
void example(){
__declspec(align(32)) const float num = 20;
__m128 a1 = _mm_broadcast_ss(&num);
__declspec(align(32)) float f1[4];
_mm_store_ps (f1, a1);
std::cout << f1[0] << " " << f1[1] <<" " << f1[2] << " " << f1[3] << "\n";
__m128 a2 = _mm_load_ps1(&num);
__declspec(align(32)) float f2[4];
_mm_store_ps (f2, a2);
std::cout << f2[0] << " " << f2[1] <<" " << f2[2] << " " << f2[3] << "\n";
}
I got same output in both ways, so why do they both exist?
The immintrin. h header file defines a set of data types that represent different types of vectors. These are; __m256 : This is a vector of eight floating point numbers (8x32 = 256 bits)
_mm256_maskstore_epi32(int *addr, __m256i mask, __m256i a) — store 32-bit values from a at addr , but only the values 32-bit values that mask specifies. Values are stored if the most significant (i.e. sign) bit of each 32-bit integer in mask is set.
_mm_broadcast_ss
only compiles for AVX targets.
_mm_load1_ps
/ _mm_load_ps1
will compile to multiple instructions (movss
/ shufps
) when compiling for targets that don't support AVX. When you are compiling for an AVX target, any good compiler will use a vbroadcastss
to implement them.
load1
/ set1
and other convenience functions were introduced early on, because it's often good to let the compiler pick the optimal strategy for moving data around.
_mm_broadcast_*
intrinsics were introduced as direct wrappers around the vbroadcastss
/ vbroadcastsd
instructions. (AVX2 has integer vpbroadcast...
, and the reg-reg forms of vbroadcastss
. AVX1 only has vbroadcastss x/ymm, [mem]
.)
_mm_load1_ps
or _mm_set1_ps
.It makes no difference to the code, and lets the same source build for non-AVX targets.
The choice might make a difference to the asm output at -O0
, but IDK. If you care about the asm output in an un-optimized build, then 1: that's weird, and 2: you'll have to see what your compiler does.
As you can see from the asm output on godbolt (for gcc):
-mno-avx
)bcast: compile error so I #ifdef it out
__m128 load1(const float*p) { return _mm_load1_ps(p); }
movss xmm0, DWORD PTR [rdi]
shufps xmm0, xmm0, 0
ret
-mavx
)__m128 bcast(const float*p) { return _mm_broadcast_ss(p); }
vbroadcastss xmm0, DWORD PTR [rdi]
ret
__m128 load1(const float*p) { return _mm_load1_ps(p); }
vbroadcastss xmm0, DWORD PTR [rdi]
ret
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With