AVX512 vector length and SAE control

Question

My question concerns EVEX-encoded packed reg-reg instructions without rounding semantic which allow SAE control (Suppress All Exceptions), such as VMIN*, VCVTT*, VGETEXT*, VREDUCE*, VRANGE* etc. Intel declares SAE-awareness only with full 512bit vector length, e.g.

VMINPD xmm1 {k1}{z}, xmm2, xmm3
VMINPD ymm1 {k1}{z}, ymm2, ymm3
VMINPD zmm1 {k1}{z}, zmm2, zmm3{sae}

but I don't see a reason why SAE couldn't be applied to instructions where xmm or ymm registers are used.

In chapter 4.6.4 of Intel Instruction Set Extensions Programming Reference Table 4-7 says that in instructions without rounding semantic bit EVEX.b specifies that SAE is applied, and bits EVEX.L'L specify explicit vector length:

00b: 128bit (XMM)
01b: 256bit (YMM)
10b: 512bit (ZMM)
11b: reserved

so their combination should be legal.

However NASM assembles vminpd zmm1,zmm2,zmm3,{sae} as 62F1ED185DCB, i.e. EVEX.L'L=00b, EVEX.b=1, which is disassembled back by NDISASM 2.12 as vminpd xmm1,xmm2,xmm3

NASM refuses to assemble vminpd ymm1,ymm2,ymm3,{sae} and NDISASM disassembles 62F1ED385DCB (EVEX.L'L=01b, EVEX.b=1) as vminpd xmm1,xmm2,xmm3

I wonder how does Knights Landing CPU execute VMINPD ymm1, ymm2, ymm3{sae} (assembled as 62F1ED385DCB, EVEX.L'L=01b, EVEX.b=1):

CPU throws an exception. Intel doc Table 4-7 is misleading.
SAE is in effect, CPU operates with xmm only, same as in scalar operations. NASM and NDISASM do it right, Intel documentation is wrong.
SAE is ignored, CPU operates with 256 bits according to VMINPD specification in Intel doc. NASM & NDISASM are wrong.
SAE is in effect, CPU operates with 256 bits as specified in instruction code. NASM and NDISASM are wrong, Intel doc needs to supplementary decorate xmm/ymm instructions with {sae}.
SAE is in effect, CPU operates with implied full vector size 512 bits, regardless of EVEX.L'L, same as if static roundings {er} were allowed. NDISASM and Intel doc Table 4-7 are wrong.

Ross Ridge · Accepted Answer

Your VMINPD ymm1, ymm2, ymm3{sae} instruction is invalid. According to instruction set reference for MINPD in the Intel Architecture Instruction Set Extensions Programming Reference (February 2016) only the following encodings are allowed:

66 0F 5D /r                  MINPD xmm1, xmm2/m128 
VEX.NDS.128.66.0F.WIG 5D /r  VMINPD xmm1, xmm2, xmm3/m128
VEX.NDS.256.66.0F.WIG 5D /r  VMINPD ymm1, ymm2, ymm3/m256
EVEX.NDS.128.66.0F.W1 5D /r  VMINPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst
EVEX.NDS.256.66.0F.W1 5D /r  VMINPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst
EVEX.NDS.512.66.0F.W1 5D /r  VMINPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{sae}

Notice that only the last version is shown with a {sae} suffix, meaning it's the only form of the instruction you're allowed to use it with. Just because the bits exists to encode a particular instruction doesn't mean its valid.

Also note that section 4.6.3, SAE Support in EVEX, makes it clear that SAE doesn't apply to 128-bit or 256-bit vectors:

The EVEX encoding system allows arithmetic floating-point instructions without rounding semantic to be encoded with the SAE attribute. This capability applies to scalar and 512-bit vector lengths, register-to-register only, by setting EVEX.b. When EVEX.b is set, “suppress all exceptions” is implied. [...]

I'm not sure however whether your hand crafted instruction would generate Invalid Opcode exception, if the EVEX.b bit will simply be ignored, or if the EVEX.L'L bits will be ignored. EVEX encoded VMINPD instructions belong to the Type E2 exception class, and according to Table 4-17, Type E2 Class Exception Conditions, the instruction can generate an #UD exception in any of the following cases:

State requirement, Table 4-8 not met.

Opcode independent #UD condition in Table 4-9.

Operand encoding #UD conditions in Table 4-10.

Opmask encoding #UD condition of Table 4-11.

If EVEX.L’L != 10b (VL=512).

Only that last reason seems to apply here, but it would mean that your instruction would generate #UD exception with or without the {sae} modifier. Since this seems to directly contradict the allowed encodings in the instruction summary, I'm not sure what would happen.

Esteis · Answer

On Twitter, iximeow gives some addenda to Ross Ridge's answer above:

ross ridge is right that the text is invalid, but the important detail is that L'L selects the specific SAE mode, so if you set L'L to indicate ymm, you just get {rd-sae}

this is to say, if you set b for sae at all, the vector width is immediately fixed to 512 bits

vector widths are fixed to 512 bits*

*except for some cvt instructions where one operand is 512 bits and one operand is smaller

(@Pepijn's comment on Ross's answer already linked to those tweets; but I figured it's worth making this a separate answer, if only for visiblity.)

AVX512 vector length and SAE control

Tags:

x86

assembly

avx512

vitsoft

2 Answers

Ross Ridge

Esteis

Recent Activity

Donate For Us

AVX512 vector length and SAE control

Tags:

x86

assembly

avx512

vitsoft

2 Answers

Ross Ridge

Esteis

Related questions

Recent Activity

Donate For Us