x64 SSE data types

Tags:

AMD64 Architecture Programmer’s Manual Volume 1: Application Programming page 226 says regarding SSE instructions:

The processor does not check the data type of instruction operands prior to executing instructions. It only checks them at the point of execution. For example, if the processor executes an arithmetic instruction that takes double-precision operands but is provided with single-precision operands by MOVx instructions, the processor will first convert the operands from single precision to double precision prior to executing the arithmetic operation, and the result will be correct. However, the required conversion may cause degradation of performance.

I don't understand this; I would have thought ymm registers simply contain 256 bits which each instruction interprets according to its expected operands, it's up to you to make sure the correct types are present, and in the scenario described, the CPU would run at full speed and silently give the wrong answer.

What am I missing?

622

asked Mar 10 '13 13:03

rwallace

1 Answers

The Intel® 64 and IA-32 Architectures Optimization Reference Manual §5.1 says something similar about mixing integer/FP "data types" (but curiously not singles and doubles):

When writing SIMD code that works for both integer and floating-point data, use the subset of SIMD convert instructions or load/store instructions to ensure that the input operands in XMM registers contain data types that are properly defined to match the instruction.

Code sequences containing cross-typed usage produce the same result across different implementations but incur a significant performance penalty. Using SSE/SSE2/SSE3/SSSE3/SSE44.1 instructions to operate on type-mismatched SIMD data in the XMM register is strongly discouraged.

The Intel® 64 and IA-32 Architectures Software Developer’s Manual is simularly confusing:

SSE and SSE2 extensions define typed operations on packed and scalar floating-point data types and on 128-bit SIMD integer data types, but IA-32 processors do not enforce this typing at the architectural level. They only enforce it at the microarchitectural level.

...

Pentium 4 and Intel Xeon processors execute these instructions without generating an invalid-operand exception (#UD) and will produce the expected results in register XMM0 (that is, the high and low 64-bits of each register will be treated as a double-precision floating-point value and the processor will operate on them accordingly).

...

In this example: XORPS or PXOR can be used in place of XORPD and yield the same correct result. However, because of the type mismatch between the operand data type and the instruction data type, a latency penalty will be incurred due to implementations of the instructions at the microarchitecture level.

Latency penalties can also be incurred by using move instructions of the wrong type. For example, MOVAPS and MOVAPD can both be used to move a packed single-precision operand from memory to an XMM register. However, if MOVAPD is used, a latency penalty will be incurred when a correctly typed instruction attempts to use the data in the register.

Note that these latency penalties are not incurred when moving data from XMM registers to memory.

I really have no idea what it means by "they only enforce it at the microarchitectural level" except that it suggests the different "data types" are treated differently by the μarch. I have a few guesses:

AIUI, x86 cores typically use register renaming due to the shortage of registers. Perhaps they internally use different registers for integer/single/double operands so they can be located nearer to the respective vector units.
It also seems possible that FP numbers are represented internally using a different format (e.g. using a bigger exponent to get rid of denorms) and converted to the canonical bits only when necessary.
CPUs use "forwarding" or "bypassing" so that execution units don't have to wait for data to be written to a register before it can be used by subsequent instructions, typically saving a cycle or two. This might not happen between the integer and FP units.

189

answered Dec 14 '22 05:12

tc.

Related questions
                            
                                Assembly language and compiled languages
                            
                                Are goto statements efficient when compared to calling functions? [closed]
                            
                                Tweak mips-gcc output to work with MARS
                            
                                PIE disabled. Absolute addressing when asm programming with gcc on mac OS X
                            
                                Howto write PC relative adressing on arm asm?
                            
                                asm - what is a local common area and difference between .lcomm and .comm?
                            
                                Apparent no-op in Rust assembly output?
                            
                                Shutdown computer in MS-DOS using ACPI
                            
                                Directly Jump to another C++ function
                            
                                How to make an bootable iso(not cd or flash drive) for testing your own boot loader?
                            
                                A valid pattern in assembly for variadic arguments
                            
                                How to demangle names in Visual Studio assembler output?
                            
                                GCC access high/low machine words in double machine word types (including asm)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

x64 SSE data types

Tags:

assembly

64-bit

sse

rwallace

People also ask

1 Answers

tc.

Recent Activity

Donate For Us