Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between shuffle and permute

In x86-64 SIMD instruction names, as well as the intrinsic functions you can use to access them from C/C++, you find both the terms shuffle (e.g., _mm_shuffle_epi32) and permute (e.g., _mm_permute_pd).

Superficially, they seem to both be used for data movement stuff. What's the difference?

like image 454
BeeOnRope Avatar asked Aug 15 '19 03:08

BeeOnRope


People also ask

What does NP random shuffle do?

shuffle. Modify a sequence in-place by shuffling its contents. This function only shuffles the array along the first axis of a multi-dimensional array.

Is riffle shuffle random?

Riffle seven times and you'll have a sufficiently random ordering of cards, an ordering that has likely never existed before. In other words, it's unlikely you'll ever shuffle two decks the same. The card shuffling result appears in a Numberphile video from 2015, along with a number of other card shuffling facts.

What is random permutation in Python?

Random Permutations of Elements A permutation refers to an arrangement of elements. e.g. [3, 2, 1] is a permutation of [1, 2, 3] and vice-versa. The NumPy Random module provides two methods for this: shuffle() and permutation() .

How do you randomly permute a list in Python?

sample() . random. shuffle() shuffles a list in place, and random. sample() returns a new randomized list.


1 Answers

I haven't looked for inspiration outside of x86. I don't think there's any standard convention here.


I think they just switched from "shuffle" to "permute" at some point in time between SSSE3 pshufb and AVX1 vpermilps/pd / vperm2f128. Everything before AVX is called "shuffle", everything after is called "permute".

(SSE4.x didn't introduce any instructions named "shuffle" or "permute", just pinsrd / pextrd and other operand sizes being the main shuffles that SSE4.1 added)


There are 2 exceptions to that, not counting VEX / EVEX encodings of vshufps, vpshufd, etc.:

AVX512F VSHUFF32X4 (and 64x2 and integer versions) 128-bit granularity lane shuffles with immediate control have the same design as vshufps: low half of the destination selects elements from the first source, high half selects from the 2nd source. e.g. _mm512_shuffle_i64x2(__m512i a, __m512i b, int imm); This naming is helpful to remember how the shuffle control works. With 4 output lanes, there's only room for 4x 2-bit selectors, not 4x 3-bit. The 256-bit operand-size version still has the same limitation so it only uses the low 2 bits of the immediate, like shufpd.

AVX512BITALG VPSHUFBITQMB is like vpmultishiftqb (parallel bitfield extract) + vector->mask (like a movemask). So it can select any 8 bits within each qword chunk of the input.

AVX512 256-bit granularity operations currently only exist with names like VEXTRACTF32x8 and VINSERTF32x8, not shuf or perm.


Intrinsic name do match instruction mnemonics as far as shuffle vs. permute, but otherwise can leave out "in lane" when the mnemonic has it, requiring the lane-crossing version to be different too. (e.g. AVX1 vpermilps = _mm_permute_ps imm8 or _mm_permutevar_ps __m128i control vs. AVX2 vpermps = _mm256_permutexvar_ps; not available with immediate control, but vpermpd is.

Intel's intrinsics guide only lists _mm256_permutevar8x32_ps for vpermps, while the ISA ref manual only lists permutexvar. I assume most compilers support the older permutexvar name. Anyway, weird choice, 8x32 sounds like an AVX512 instruction (with per-element masking); and maybe that's where that new intrinsic name came from.


There are no other pattern I've noticed. We can easily rule out all of the following hypotheses:

  • copy-and-shuffle (pshufd xmm, xmm/mem, imm) vs. in-place shuffle (pshufb data, idx or shufps xmm, xmm, imm)
  • immediate control vs. variable-control (pshufd vs. pshufb or AVX2 vpermd vs. vperm2i128)
  • integer vs. FP (SSE2 pshufd vs. shufps/pd ; AVX2 vpermps vs. vpermd)
  • 1-source vs. 2-source (pshufd vs. shufps ; AVX2 vpermd vs. AVX512 vpermt2d)
  • lane-crossing vs. in-lane (AVX1 vpermilps vs. AVX2 vpermps)

The shuffle-control immediate works the same way in pshufd and vpermq-immediate. But unlike the "tricky" vshuff32x4 case, both pshufd and vpermq work the obvious way so no need to make an analogy to another mnemonic. Also, "pshuf" is a bit awkward vs. "shuf" or "perm", so I can see why they'd want something else for packed-integer.

Note that the "shuf" names go all the way back to SSE1 shufps, introduced by Pentium III (Katmai) at the same time as MMX2 pshufw mm, mm, imm8.

P5 Pentium MMX didn't have any instructions named shuf/perm instructions, just the punpckl/h shuffles in various sizes.

https://nasm.us/doc/nasmdocb.html#section-B.1.7 (That NASM appendix is helpful because it sorts mnemonics into groups by order of introduction. It's what got me to notice the vshuff32x4 mnemonic down in the AVX512 stuff after I thought they'd switched to calling everything "perm".)

like image 58
Peter Cordes Avatar answered Oct 21 '22 09:10

Peter Cordes