Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I broadcast the lowest word of a __m256i?

I am trying to write AVX2 code using intrinsics. Want to know how to use Intel intrinsics to broadcast the lowest word in an YMM to an entire YMM. I know that with assembly code I could just write

vpbroadcastw ymm1, xmm0

because the lowest word of ymm0 is also the lowest word of xmm0. I have a variable x which is a value in an YMM. But

_mm256_broadcastw_epi16((__m128i) x)

where x is an __m256i returns an error -- can't convert two things of different sizes.

rq_recip3_new.c:381:5: error: can’t convert a value of type ‘__m256i {aka __vector(4) long long int}’ to vector type ‘__vector(2) long long int’ which has different size

I don't think this matters but my machines use gcc 6.4.1 and 7.3 (Fedora 25 and Ubuntu LTS 16.04 respectively).

like image 859
Bo-Yin Yang Avatar asked Dec 29 '18 09:12

Bo-Yin Yang


1 Answers

The following should work:

__m256i broadcast_word(__m256i x){
    return _mm256_broadcastw_epi16(_mm256_castsi256_si128(x));
}

With intrinsics, _mm256_castsi256_si128 is the right way to cast from 256 to 128 bits.

With Godbolt Compiler Explorer this compiles to (gcc 7.3):

broadcast_word:
  vpbroadcastw ymm0, xmm0
ret
like image 184
wim Avatar answered Nov 04 '22 16:11

wim