c++ AVX512 intrinsic equivalent of _mm256_broadcast_ss()?

Question

I'm rewriting a code from AVX2 to AVX512.

What's the equivalent I can use to broadcast a single float number to a _mm512 vector? In AVX2 it is _mm256_broadcast_ss() but I can't find something like _mm512_broadcast_ss().

Carlos · Accepted Answer

AVX512 doesn't need a special intrinsic for the memory source version¹. You can simply use _mm512_set1_ps (which takes a float, not a float*). The compiler should use a memory-source broadcast if that's efficient. (Potentially even folded into a broadcast memory source for an ALU instruction instead of a separate load; AVX512 can do that for 512-bit vectors.)

https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_set1_ps&expand=5236,4980

Footnote 1: The reason for _mm256_broadcast_ss even existing separately from _mm256_set1_ps is probably because of AVX1 vbroadcastss ymm, [mem] vs. AVX2 vbroadcastss ymm, xmm. Some compilers like MSVC and ICC let you use intrinsics without enabling the ISA extensions for the compiler to use anywhere, so there needed to be an intrinsic for only the AVX1 memory-source version specifically.

With AVX512, both memory and register source forms were introduced with AVX512F so there's no need to give users of those compilers a way to micro-manage which asm is allowed.

c++ AVX512 intrinsic equivalent of _mm256_broadcast_ss()?

Tags:

c++

intel

intrinsics

avx512

avx2

Noob

1 Answers

Carlos

Recent Activity

Donate For Us

c++ AVX512 intrinsic equivalent of _mm256_broadcast_ss()?

Tags:

c++

intel

intrinsics

avx512

avx2

Noob

1 Answers

Carlos

Related questions

Recent Activity

Donate For Us