In SSE the prefixes 066h
(operand size override) 0F2H
(REPNE) and 0F3h
(REPE) are part of the opcode.
In non-SSE 066h
switches between 32-bit (or 64-bit) and 16-bit operation. 0F2h
and 0F3h
are used for string operations. They can be combined so that 066h
and 0F2h
(or 0F3h
) can be used in the same instruction, because this is meaningful. What is the behavior in an SSE instruction? For instance, we have (ignoring mod/rm for now):
0f 58 addps
66 0f 58 addpd
f2 0f 58 addsd
f3 0f 58 addss
But what is this?
66 f2 0f 58
And how about?
f2 66 0f 58
Not to mention the following which has two conflicting REP prefixes:
f2 f3 0f 58
What is the spec for these?
I do not remember having seen any specification on what you should expect in the case of wildly combining random prefixes, so I guess CPU behaviour may be "undefined" and possibly CPU-specific. (Clearly, some things are specified in e.g. Intel's docs, but many cases aren't covered). And some combinations may be reserved for future use.
My naive assumptions would generally have been that additional prefixes would be no-ops but there's no guarantee. That seems reasonable given that e.g. some optimising manuals recommend multi-byte NOP
(canonically 90h
) by prefixing with 66h
, e.g.:
db 66h, 90h; 2-byte NOP
db 66h, 66h, 90h; 3-byte NOP
db 66h, 66h, 66h, 90h; 4-byte NOP
However, I also know that CS
and DS
segment override prefixes have aquired novel functions as SSE2 branch hint prefixes (predict branch taken = 3Eh
= DS
override; predict branch not taken = 2Eh
= CS
override) when applied to conditional jump instructions.
Anyway, I looked at your examples above, always setting XMM1
to all 0
and XMM7
to all 0FFh
by
pxor xmm1, xmm1 ; xmm1 <- 0s
pcmpeqw xmm7, xmm7 ; xmm7 <- FFs
and then the code in question, with xmm1, xmm7
arguments. What I observed (32bit code on Win64 system and Intel T7300 Core 2 Duo) was:
1) no change observed for addsd
by adding 66h
prefix
db 66h
addsd xmm1, xmm7 ;total sequence = 66 F2 0F 58 CF
2) no change observed for addss
by adding 0F2h
prefix
db 0f2h
addss xmm1,xmm7 ;total sequence = F2 F3 0F 58 CF
3) However, I observed a change by prefixing addpd
by 0F2h
:
db 0f2h
addpd xmm1, xmm7 ;total sequence = F2 66 0F 58 CF
In this case, the result in XMM1 was 0000000000000000FFFFFFFFFFFFFFFFh
instead of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFh
.
So my conclusion is that one shouldn't make any assumptions and expect "undefined" behaviour. I wouldn't be surprised, however, if you could find some clues in Agner fog's manuals.
Intel's SDM vol.2 manual (the instruction set reference) refers to these as mandatory prefixes. Think of them as part of the opcode.
But yes, they are prefixes and can be mixed with other prefixes ahead of the actual escape-byte+opcode. In fact a REX prefix must go after other prefixes.
As usual, using multiple conflicting prefixes from the same group happens to decode with the last one taking priority on current Intel hardware. I think Intel manuals say that doing this can give unpredictable behaviour so it's not guaranteed or future proof. It's not a meaningful thing to do; if you want to pad an instruction to make it longer for alignment reasons, I think repeating the same prefix a couple times is safe.
B.8 SSE INSTRUCTION FORMATS AND ENCODINGS
The SSE instructions use the ModR/M format and are preceded by the 0FH prefix byte. In general, operations are not duplicated to provide two directions (that is, separate load and store variants).
The following three tables (Tables B-22, B-23, and B-24) show the formats and encodings for the SSE SIMD floating-point, SIMD integer, and cacheability and memory ordering instructions, respectively. Some SSE instructions require a mandatory prefix (66H, F2H, F3H) as part of the two-byte opcode. Mandatory prefixes are included in the tables.
And also
2.1.2 Opcodes
A primary opcode can be 1, 2, or 3 bytes in length. An additional 3-bit opcode field is sometimes encoded in the ModR/M byte. Smaller fields can be defined within the primary opcode. Such fields define the direction of operation, size of displacements, register encoding, condition codes, or sign extension. Encoding fields used by an opcode vary depending on the class of operation.
Two-byte opcode formats for general-purpose and SIMD instructions consist of one of the following:
- An escape opcode byte
0FH
as the primary opcode and a second opcode byte.- A mandatory prefix (
66H
,F2H
, orF3H
), an escape opcode byte, and a second opcode byte (same as previous bullet).For example, CVTDQ2PD consists of the following sequence:
F3 0F E6
. The first byte is a mandatory prefix (it is not considered as a repeat prefix). Three-byte opcode formats for general-purpose and SIMD instructions consist of one of the following:
- An escape opcode byte 0FH as the primary opcode, plus two additional opcode bytes.
- A mandatory prefix (66H, F2H, or F3H), an escape opcode byte, plus two additional opcode bytes (same as previous bullet).
For example, PHADDW for XMM registers consists of the following sequence: 66 0F 38 01. The first byte is the mandatory prefix.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With