I have been going through Intel Intrinsics and every function is working on integers or floats or double that are packed or unpacked or extended packed.
It seems like this question should be answered some where on the internet but I can't find the answer at all.
What is that packing thing?
Well, I've just been searching for the answer to the same question, and also with no success. So I can only be guessing.
Intel introduced packed and scalar instructions already in their MMX technology. For example, they introduced a function
__m64 _mm_add_pi8 (__m64 a, __m64 b)
At that time there was no such a thing as "extended packed". The only data type was __m64
and all operations worked on integers.
With SSE there came 128-bit registers and operations on floating point numbers. However, SSE2 included a superset of MMX operations on integers performed in 128-bit registers. For example,
__m128i _mm_add_epi8 (__m128i a, __m128i b)
Here for the first time we see the "ep" (extended packed") part of the function name. Why it was introduced? I believe this was a solution to the problem of the name _mm_add_pi8
being already taken by the MMX instruction listed above. The interface of SSE/AVX is in the C language, where there's no polymorphism of function names.
With AVX, Intel chose a different strategy, and started to add the register length just after the opening "_mm" letters, c.f.:
__m256i _mm256_add_epi8 (__m256i a, __m256i b)
__m512i _mm512_add_epi8 (__m512i a, __m512i b)
Why they here chose "ep" and not "p" is a mystery, irrelevant for programmers. Actually, they seem to use "p" for operations on floats and doubles and "ep" for integers.
__m128d _mm_add_pd (__m128d a, __m128d b); // "d": function operates on doubles
__m256 _mm256_add_ps (__m256 a, __m256 b); // "s": function operates on floats
Perhaps this goes back to the transition from MMX to SSE, where "ep" was introduced for operations on integers (no floats were handled by MMX) and an attempt to make AVX mnemonics as close to the SSE ones as possible
Thus, basically, from the perspective of a programmer, there's no difference between "ep" ("extended packed") and "p" ("packed"), for we are already aware of the register length that we target in our code.
As for the next part of the question, "unpacking" belongs to a completely different category of notions than "scalar" and "packed". This is rather a colloquial term for a particular data rearrangement or shuffle, like rotation or shift.
The reason for using "epi" in the name of intrinsics like _mm256_unpackhi_epi16
is that it is a truly vector (not scalar) function on a vector of 16-bit integer elements. Notice that here "unpack" belongs to the part of the function name that describe its action (like mul, add, or permute), whereas "s" / "p" / "ep" (scalar, packed, extended packed) belong to the part describing the operation mode (scalar for "s", vector for "p" or "ep").
(There are no scalar-integer instructions that operate between two XMM registers, but "si" does appear in the intrinsic name for movd eax, xmm0
: _mm_cvtsi128_si32
. There are a few similar intrinsics.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With