In another question on SO we tried (and succeeded) to find a way to replace the AVX missing instruction:
__m256d _mm256_dp_pd(__m256d m1, __m256d m2, const int mask);
Anyone knows the reason why this instruction is missing ? Partial answer here.
The underlying reason for this and various other AVX limitations is that architecturally AVX is little more than two SSE execution units side by side - you will notice that virtually no AVX instructions operate horizontally across the boundary between the two 128 bit halves of a vector (which is particularly annoying in the case of vpalignr
). In general you effectively just get two 128 bit SSE operations in parallel, which is useful for the majority of instructions which just operate in an element-wise fashion, but not as useful as a proper 256 bit SIMD implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With