Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Meaning of suffix "x" in intrinsics like "_mm256_set1_epi64x"

In some intrinsics they use suffix x like _mm256_set1_epi64x . What's the meaning of it? For reference, _mm256_set1_epi32 comes without this suffix.

like image 849
Serge Rogatch Avatar asked Jul 08 '17 18:07

Serge Rogatch


1 Answers

TL:DR: MMX->SSE2 conversion intrinsics took the non-x _mm_set/set1_epi64 names.

This is all guesswork based on current function names, known history, and some compiler behaviour:

The first Intel SIMD intrinsics were for MMX. __m64 is the MMX equivalent of SSE2 __m128i and AVX2 __m256i. There were no 64-bit x86 CPUs at the time, so the widest set intrinsic was __m64 _mm_set_pi32 (int e1, int e0). According to the intrinsic-finder, there still isn't any intrinsic for movq mm0, rax. I think you can/should just cast int64_t to __m64. (Although last time I experimented in the last year or so, gcc or clang (I forget which) did a poor job optimizing the MMX asm. Aging compiler support is yet another reason to avoid MMX for new projects.)

When SSE2 was introduced in 2001, AMD64 / x86-64 still wasn't released yet, and wouldn't be supported by Intel for a few years. (At that time they were hoping that IA-64 / Itanium would be the future and replace x86). I haven't checked old manuals, but I guess that
__m128i _mm_set1_epi64 (__m64 a) was available back then and
__m128i _mm_set1_epi64x (__int64 a) probably wasn't. (Notice that __int64 is not int64_t from <stdint.h>. But it is a 64-bit integer type and is nothing to worry about.)

The epi stands for Extended(?) Packed Integer. epi instead of pi tells you it's an SSE intrinsic, not an MMX intrinsic. For intrinsics that convert from one element width to another, the intrinsics use the source width if that unambiguously identifies the operation (at least for the ones I looked at). e.g. _mm_packs_epi32 (packssdw) or _mm_unpackhi_epi16 (punpckhwd). PMOVZX needs both numbers, because there's _mm_cvtepu8_epi32 (pmovzxbd), _mm_cvtepu8_epi64 (pmovzxbq, etc.


Compilers did of course support 64-bit integers in 32-bit mode, so it would have made sense for Intel to include intrinsics for working with them. But IIRC, in some compilers the 64x intrinsics are only available when compiling 64-bit code. The 64x is only relevant for converting to/from scalar 64-bit integers, so you won't find an x version of _mm_add_epi64 or anything like that.

This only-in-64bit thing may still exist for _mm256_set1_epi64x depending on the compiler, but either way that history explains why 64x but not 32x.

(Sorry I'm lazy and didn't put together an experiment on Godbolt to check for current compilers with -m32. It might be interesting to see what kind of asm you get from casting int64_t to __m64 and using a _mm_set intrinsic in 32-bit code.)

like image 154
Peter Cordes Avatar answered Nov 15 '22 10:11

Peter Cordes