I am trying to write AVX2 code using intrinsics. Want to know how to use Intel intrinsics to broadcast the lowest word in an YMM to an entire YMM. I know that with assembly code I could just write
vpbroadcastw ymm1, xmm0
because the lowest word of ymm0 is also the lowest word of xmm0. I have a variable x which is a value in an YMM. But
_mm256_broadcastw_epi16((__m128i) x)
where x is an __m256i returns an error -- can't convert two things of different sizes.
rq_recip3_new.c:381:5: error: can’t convert a value of type ‘__m256i {aka __vector(4) long long int}’ to vector type ‘__vector(2) long long int’ which has different size
I don't think this matters but my machines use gcc 6.4.1 and 7.3 (Fedora 25 and Ubuntu LTS 16.04 respectively).
The following should work:
__m256i broadcast_word(__m256i x){
return _mm256_broadcastw_epi16(_mm256_castsi256_si128(x));
}
With intrinsics, _mm256_castsi256_si128
is the right way to cast from 256 to 128 bits.
With Godbolt Compiler Explorer this compiles to (gcc 7.3):
broadcast_word:
vpbroadcastw ymm0, xmm0
ret
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With