Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ASM x86_64 AVX: xmm and ymm registers differences

What are the differences between xmm and ymm registers? I thought that xmm is for SSE, and ymm is for AVX, but I wrote some code:

vmovups     ymm1, [r9]      
vcvtss2si   rcx, ymm1

and it gives me:

error: invalid combination of opcode and operands

It's about the line:

vcvtss2si   rcx, ymm1

So I wrote:

vcvtss2si   rcx, xmm1

and it works as intended. The first value of ymm1 vector, converted to integer, is now in rcx.

What is it all about? ymm1 and xmm1 are the same registers?

like image 880
SciArt Avatar asked Jan 07 '18 17:01

SciArt


People also ask

What are XMM and YMM registers?

YMM registers are 256 bits long. XMM registers are 128 bits long and represent the lower 128 bits of the YMM registers. The YMM and XMM registers are overlapping and XMM are contained in YMM .

What are XMM registers used for?

XMM registers can only be used to perform calculations on data; they cannot be used to address memory. Addressing memory is accomplished by using the general-purpose registers. consecutive bytes, with the low-order byte of the register being stored in the first byte in memory.

How many YMM registers are there?

Similar to the XMM registers, there are 16 YMM registers (ymm0 ∼ ymm15) in the CPUS. The size of ymm is twice bigger than xmm. Therefore, YMM registers make it possible to process eight single precision floating point numbers or four double precision floating point numbers, simultaneously.

How many XMM registers are there?

There are eight XMM registers available in non -64-bit modes and 16 XMM registers in long mode, which allow simultaneous operations on: 16 bytes.


Video Answer


1 Answers

xmm0 is the low half of ymm0, exactly like eax is the low half of rax.

Writing to xmm0 (with a VEX-coded instruction, not legacy SSE) zeros the upper lane of ymm0, just like writing to eax zeros the upper half of rax to avoid false dependencies. Lack of zeroing the upper bytes for legacy SSE instructions is why there's a penalty for mixing AVX and legacy SSE instructions.

Most AVX instructions are available with either 128-bit or 256-bit size. e.g. vaddps xmm0, xmm1, xmm2 or vaddps ymm0, ymm1, ymm2. (The 256-bit versions of most integer instructions are only available in AVX2, with AVX only providing the 128-bit version. There are a couple exceptions, like vptest ymm, ymm in AVX1. And vmovdqu if you count that as an "integer" instruction).

Scalar instructions like vmovd, vcvtss2si, and vcvtsi2ss are only available with XMM registers. Reading a YMM register is not logically different from reading an XMM register, but writing the low element (and leaving the other elements unmodified, like the poorly-designed vcvtsi2ss does) would be different for XMM vs. YMM, because the YMM version would leave the upper lane not zeroed.


But scalar with ymm doesn't exist in the machine-code encoding, even for instructions where it would be really useful like vpinsrd / vpextrd (insert / extract a scalar).

Note that even though reading an XMM register and taking only the low scalar element is logically the same as YMM, for the actual implementation it would not be the same. Reading a YMM register implies an AVX-256 instruction, which would have to transition the CPU out of the "saved upper" state (for an Intel CPU with SSE/AVX transitions / states).

In any case, vcvtss2si rax, ymm0 is not encodeable, and the assembler doesn't magically assemble it as vcvtss2si rax, xmm0. If you're writing in asm, you're supposed to know exactly what you're doing. (Although some assemblers will optimize mov rax, 1 to mov eax, 1 for you, so letting you get away with writing ymm as a source register would work. But letting you write ymm as a destination register for vcvtsi2ss would change the meaning, so for consistency it's better that it doesn't do either).

like image 107
Peter Cordes Avatar answered Sep 18 '22 05:09

Peter Cordes