Intel AVX : Why is there no 256-bits version of dot product for double precision floating point variables? [closed]

Question

In another question on SO we tried (and succeeded) to find a way to replace the AVX missing instruction:

 __m256d _mm256_dp_pd(__m256d m1, __m256d m2, const int mask);

Anyone knows the reason why this instruction is missing ? Partial answer here.

Paul R · Accepted Answer

The underlying reason for this and various other AVX limitations is that architecturally AVX is little more than two SSE execution units side by side - you will notice that virtually no AVX instructions operate horizontally across the boundary between the two 128 bit halves of a vector (which is particularly annoying in the case of vpalignr). In general you effectively just get two 128 bit SSE operations in parallel, which is useful for the majority of instructions which just operate in an element-wise fashion, but not as useful as a proper 256 bit SIMD implementation.

Intel AVX : Why is there no 256-bits version of dot product for double precision floating point variables? [closed]

Tags:

c++

performance

avx

simd

gleeen.gould

1 Answers

Paul R

Recent Activity

Donate For Us

Intel AVX : Why is there no 256-bits version of dot product for double precision floating point variables? [closed]

Tags:

c++

performance

avx

simd

gleeen.gould

1 Answers

Paul R

Related questions

Recent Activity

Donate For Us