What are the differences between xmm
and ymm
registers?
I thought that xmm
is for SSE, and ymm
is for AVX, but I wrote some code:
vmovups ymm1, [r9]
vcvtss2si rcx, ymm1
and it gives me:
error: invalid combination of opcode and operands
It's about the line:
vcvtss2si rcx, ymm1
So I wrote:
vcvtss2si rcx, xmm1
and it works as intended. The first value of ymm1
vector, converted to integer, is now in rcx
.
What is it all about? ymm1
and xmm1
are the same registers?
YMM registers are 256 bits long. XMM registers are 128 bits long and represent the lower 128 bits of the YMM registers. The YMM and XMM registers are overlapping and XMM are contained in YMM .
XMM registers can only be used to perform calculations on data; they cannot be used to address memory. Addressing memory is accomplished by using the general-purpose registers. consecutive bytes, with the low-order byte of the register being stored in the first byte in memory.
Similar to the XMM registers, there are 16 YMM registers (ymm0 ∼ ymm15) in the CPUS. The size of ymm is twice bigger than xmm. Therefore, YMM registers make it possible to process eight single precision floating point numbers or four double precision floating point numbers, simultaneously.
There are eight XMM registers available in non -64-bit modes and 16 XMM registers in long mode, which allow simultaneous operations on: 16 bytes.
xmm0
is the low half of ymm0
, exactly like eax
is the low half of rax
.
Writing to xmm0
(with a VEX-coded instruction, not legacy SSE) zeros the upper lane of ymm0
, just like writing to eax
zeros the upper half of rax
to avoid false dependencies. Lack of zeroing the upper bytes for legacy SSE instructions is why there's a penalty for mixing AVX and legacy SSE instructions.
Most AVX instructions are available with either 128-bit or 256-bit size. e.g. vaddps xmm0, xmm1, xmm2
or vaddps ymm0, ymm1, ymm2
. (The 256-bit versions of most integer instructions are only available in AVX2, with AVX only providing the 128-bit version. There are a couple exceptions, like vptest ymm, ymm
in AVX1. And vmovdqu
if you count that as an "integer" instruction).
Scalar instructions like vmovd
, vcvtss2si
, and vcvtsi2ss
are only available with XMM registers. Reading a YMM register is not logically different from reading an XMM register, but writing the low element (and leaving the other elements unmodified, like the poorly-designed vcvtsi2ss
does) would be different for XMM vs. YMM, because the YMM version would leave the upper lane not zeroed.
But scalar with ymm doesn't exist in the machine-code encoding, even for instructions where it would be really useful like vpinsrd
/ vpextrd
(insert / extract a scalar).
Note that even though reading an XMM register and taking only the low scalar element is logically the same as YMM, for the actual implementation it would not be the same. Reading a YMM register implies an AVX-256 instruction, which would have to transition the CPU out of the "saved upper" state (for an Intel CPU with SSE/AVX transitions / states).
In any case, vcvtss2si rax, ymm0
is not encodeable, and the assembler doesn't magically assemble it as vcvtss2si rax, xmm0
. If you're writing in asm, you're supposed to know exactly what you're doing. (Although some assemblers will optimize mov rax, 1
to mov eax, 1
for you, so letting you get away with writing ymm
as a source register would work. But letting you write ymm
as a destination register for vcvtsi2ss
would change the meaning, so for consistency it's better that it doesn't do either).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With