(This question was originally about the CVTSI2SD
instruction and the fact that I thought it didn't work on the Pentium M CPU, but in fact it's because I'm using a custom OS and I need to manually enable SSE.)
I have a Pentium M CPU and a custom OS which so far used no SSE instructions, but I now need to use them.
Trying to execute any SSE instruction results in an interruption 6, illegal opcode (which in Linux would cause a SIGILL
, but this isn't Linux), also referred to in the Intel architectures software developer's manual (which I refer from now on as IASDM) as #UD - Invalid Opcode (UnDefined Opcode).
Edit: Peter Cordes actually identified the right cause, and pointed me to the solution, which I resume below:
If you're running an ancient OS that doesn't support saving XMM regs on context switches, the SSE-enabling bit in one of the machine control registers won't be set.
Indeed, the IASDM mentions this:
If an operating system did not provide adequate system level support for SSE, executing an SSE or SSE2 instructions can also generate #UD.
Peter Cordes pointed me to the SSE OSDev wiki, which describes how to enable SSE by writing to both CR0
and CR4
control registers:
clear the CR0.EM bit (bit 2) [ CR0 &= ~(1 << 2) ]
set the CR0.MP bit (bit 1) [ CR0 |= (1 << 1) ]
set the CR4.OSFXSR bit (bit 9) [ CR4 |= (1 << 9) ]
set the CR4.OSXMMEXCPT bit (bit 10) [ CR4 |= (1 << 10) ]
Note that, in order to be able to write to these registers, if you are in protected mode, then you need to be in privilege level 0. The answer to this question explains how to test it: if in protected mode, that is, when bit 0 (PE
) in CR0
is set to 1, then you can test bits 0 and 1 from the CS
selector, which should be both 0.
Finally, the custom OS must properly handle XMM registers during context switches, by saving and restoring them when necessary.
If you're running an ancient or custom OS that doesn't support saving XMM regs on context switches, it won't have set the SSE-enabling bits in the machine control registers. In that case all instructions that touch xmm regs will fault.
Took me a sec to find, but http://wiki.osdev.org/SSE explains how to alter CR0 and CR4 to allow SSE instructions to run on bare metal without #UD
.
My first thought on your old version of the question was
that you might have compiled your program with -mavx
, -march=sandybridge
or equivalent, causing the compiler to emit the VEX-encoded version of everything.
CVTSI2SD xmm1, xmm2/m32 ; SSE2
VCVTSI2SD xmm1, xmm2, xmm3/m32 ; AVX
See https://stackoverflow.com/tags/x86/info for links, including to Intel's insn set ref manual.
Related: Which versions of Windows support/require which CPU multimedia extensions? has some details about how to check for support for AVX and AVX512 (which also introduce new architectural state, so the OS has to set a bit or the HW will fault). It's coming at it from the other angle, but the links should indicate how to activate / disable AVX support.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With