The EVEX.z bit is used in AVX-512 in conjunction with the k registers to control masking. If the z bit is 0, it's merge-masking and if the z bit is 1 the zero elements in the k register are zeroed in the output. The syntax looks like this: <pre class="prettyprint"><code>VPSUBQ zmm0{k2}{z},zmm1,zmm2 </code></pre> where {z} represents the z bit. But how do you set or test the EVEX.z bit? I've searched every resource I can find but I haven't found an answer.

As I understand it, what they mean is that <code>VPSUBQ zmm0{k2}{z},zmm1,zmm2</code> and <code>VPSUBQ zmm0{k2},zmm1,zmm2</code> are two different instructions, whose encoding differs in a single bit, called the "z bit". (It's specifically part of the EVEX prefix to the instruction. Wikipedia documents all the fields) So you "set the z bit" by specifying <code>{z}</code> in your assembler source, telling the assembler to generate an instruction with the corresponding bit set. This is documented lots of places, like Intel's vol.2 instruction set manual, and somewhat in Intel's intrinsics guide with mask (merge-masking) vs. maskz (zero-masking) versions of most intrinsics) It is not a physical bit in the CPU state like the direction flag or something, that would persist from one instruction to the next. It doesn't make sense to "test" it. <hr> To illustrate, here's what I get by assembling both versions: <pre class="prettyprint"><code>00000000 62F1F5CAFBC2 vpsubq zmm0{k2}{z},zmm1,zmm2 00000006 62F1F54AFBC2 vpsubq zmm0{k2},zmm1,zmm2 </code></pre> Note the encodings differ in the high bit of the fourth byte. That's your "z bit". <hr> Maybe you were thinking that you could "set" or "clear" the z bit at runtime, thus changing the masking effect of subsequent instructions? Since it's part of the encoding of each instruction, not the CPU state, that way of thinking only works if you were JITing the instructions on the fly or using self-modifying code. In "normal" ahead-of-time code, you'll have to write the code in both versions, once with <code>{z}</code> instructions and once without. Use a conditional jump to decide which version to execute.

Intel AVX-512: how to set the EVEX.z bit

Tags:

x86

assembly

machine-code

avx512

The EVEX.z bit is used in AVX-512 in conjunction with the k registers to control masking. If the z bit is 0, it's merge-masking and if the z bit is 1 the zero elements in the k register are zeroed in the output.

The syntax looks like this:

Click to copy

VPSUBQ zmm0{k2}{z},zmm1,zmm2

where {z} represents the z bit.

But how do you set or test the EVEX.z bit? I've searched every resource I can find but I haven't found an answer.

507

asked Mar 20 '20 16:03

RTC222

1 Answers

As I understand it, what they mean is that VPSUBQ zmm0{k2}{z},zmm1,zmm2 and
VPSUBQ zmm0{k2},zmm1,zmm2 are two different instructions, whose encoding differs in a single bit, called the "z bit". (It's specifically part of the EVEX prefix to the instruction. Wikipedia documents all the fields)

So you "set the z bit" by specifying {z} in your assembler source, telling the assembler to generate an instruction with the corresponding bit set. This is documented lots of places, like Intel's vol.2 instruction set manual, and somewhat in Intel's intrinsics guide with mask (merge-masking) vs. maskz (zero-masking) versions of most intrinsics)

It is not a physical bit in the CPU state like the direction flag or something, that would persist from one instruction to the next. It doesn't make sense to "test" it.

To illustrate, here's what I get by assembling both versions:

Click to copy

00000000  62F1F5CAFBC2      vpsubq zmm0{k2}{z},zmm1,zmm2
00000006  62F1F54AFBC2      vpsubq zmm0{k2},zmm1,zmm2

Note the encodings differ in the high bit of the fourth byte. That's your "z bit".

Maybe you were thinking that you could "set" or "clear" the z bit at runtime, thus changing the masking effect of subsequent instructions? Since it's part of the encoding of each instruction, not the CPU state, that way of thinking only works if you were JITing the instructions on the fly or using self-modifying code.

In "normal" ahead-of-time code, you'll have to write the code in both versions, once with {z} instructions and once without. Use a conditional jump to decide which version to execute.

answered Oct 02 '22 07:10

Nate Eldredge

Related questions
                            
                                How to execute a call instruction with a 64-bit absolute address?
                            
                                How to synchronize on ARM when one thread is writing code which the other thread may be executing concurrently?
                            
                                How do I decode a machine instruction to assembly in LEGv8?
                            
                                Whats the fundamental difference between addressing of array[di] and [array + di] in assembly?
                            
                                GCC Assembly "+t"
                            
                                How can I prevent functions from being aligned to 16 bytes boundary when compiling for X86?
                            
                                Why does AES in SSE not provide full function?
                            
                                How can I use interrupts to trigger a divide-by-zero error exception in x86 assembly?
                            
                                How to go From Assembler instruction to C code
                            
                                How do I translate x86 GCC-style C inline assembly to Rust inline assembly?
                            
                                Can rip be used with another register with RIP-relative addressing?
                            
                                Why do we use byte addressing instead of word addressing?
                            
                                Operand type mismatch when using "jmp *%esp"
                            
                                OpenCL online compilation: get assembly from cl::program or cl::kernel
                            
                                Why does g++ use movabs, and with a weird constant, for a simple reduction?
                            
                                Upper bits of EBX are zeroed out when single-stepping in CodeView
                            
                                objdump produces wrong branch opcode interpretation
                            
                                Disambiguate labels from register names in the Intel syntax
                            
                                Playing sound with the PC Speaker in x86 Assembly
                            
                                UEFI boot services CreateEvent() returning status EFI_INVALID_PARAMETER

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With