Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I enable SSE for my freestanding bootable code?

(This question was originally about the CVTSI2SD instruction and the fact that I thought it didn't work on the Pentium M CPU, but in fact it's because I'm using a custom OS and I need to manually enable SSE.)

I have a Pentium M CPU and a custom OS which so far used no SSE instructions, but I now need to use them.

Trying to execute any SSE instruction results in an interruption 6, illegal opcode (which in Linux would cause a SIGILL, but this isn't Linux), also referred to in the Intel architectures software developer's manual (which I refer from now on as IASDM) as #UD - Invalid Opcode (UnDefined Opcode).

Edit: Peter Cordes actually identified the right cause, and pointed me to the solution, which I resume below:

If you're running an ancient OS that doesn't support saving XMM regs on context switches, the SSE-enabling bit in one of the machine control registers won't be set.

Indeed, the IASDM mentions this:

If an operating system did not provide adequate system level support for SSE, executing an SSE or SSE2 instructions can also generate #UD.

Peter Cordes pointed me to the SSE OSDev wiki, which describes how to enable SSE by writing to both CR0 and CR4 control registers:

clear the CR0.EM bit (bit 2) [ CR0 &= ~(1 << 2) ]
set the CR0.MP bit (bit 1) [ CR0 |= (1 << 1) ]
set the CR4.OSFXSR bit (bit 9) [ CR4 |= (1 << 9) ]
set the CR4.OSXMMEXCPT bit (bit 10) [ CR4 |= (1 << 10) ]

Note that, in order to be able to write to these registers, if you are in protected mode, then you need to be in privilege level 0. The answer to this question explains how to test it: if in protected mode, that is, when bit 0 (PE) in CR0 is set to 1, then you can test bits 0 and 1 from the CS selector, which should be both 0.

Finally, the custom OS must properly handle XMM registers during context switches, by saving and restoring them when necessary.

like image 260
anol Avatar asked Jul 22 '15 12:07

anol


1 Answers

If you're running an ancient or custom OS that doesn't support saving XMM regs on context switches, it won't have set the SSE-enabling bits in the machine control registers. In that case all instructions that touch xmm regs will fault.

Took me a sec to find, but http://wiki.osdev.org/SSE explains how to alter CR0 and CR4 to allow SSE instructions to run on bare metal without #UD.


My first thought on your old version of the question was that you might have compiled your program with -mavx, -march=sandybridge or equivalent, causing the compiler to emit the VEX-encoded version of everything.

CVTSI2SD   xmm1, xmm2/m32         ; SSE2
VCVTSI2SD  xmm1, xmm2, xmm3/m32   ; AVX

See https://stackoverflow.com/tags/x86/info for links, including to Intel's insn set ref manual.


Related: Which versions of Windows support/require which CPU multimedia extensions? has some details about how to check for support for AVX and AVX512 (which also introduce new architectural state, so the OS has to set a bit or the HW will fault). It's coming at it from the other angle, but the links should indicate how to activate / disable AVX support.

like image 72
Peter Cordes Avatar answered Nov 15 '22 09:11

Peter Cordes