Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set MMX registers in a Windows exception handler to emulate unsupported 3DNow! instructions

I'm trying to revive an old Win32 game that uses 3DNow! instruction set to make 3D rendering.

On modern OSs like Win7 - Win10 instructions like FPADD or FPMUL are not allowed and the program throws an exception.

Since the number of 3DNow! instuctions used by the game is very limited, in my VS2008 MFC program I tried to use vectored exception handling to get the value of MMX registers, emulate the 3DNow! instructions by C code and push the values back to the processor 3DNow! registers.

So far I succeeded in first two steps (I get mmx register values from ExceptionInfo->ExtendedRegisters byte array at offset 32 and use float type C instructions to make calculations), but my problem is that, no matter how I try to update the MMX register values the register values seem to stay unchanged.

Assuming that my _asm statements might be wrong, I did also some minimal test using simple statements like this:

_asm movq mm0 mm7

This statement is executed without further exceptions, but when retrieving the MMX register values I still find that the original values were unchanged.

How can I make the assignment effective?

like image 421
gho Avatar asked Oct 27 '17 07:10

gho


1 Answers

On modern OSs like Win7 - Win10 instructions like FPADD or FPMUL are not allowed

More likely your CPU doesn't support 3DNow! AMD dropped it for Bulldozer-family, and Intel never supported it. So unless you're running modern Windows on an Athlon64 / Phenom (or a Via C3), your CPU doesn't support it.

(Fun fact: PREFETCHW was originally a 3DNow! instruction, and is still supported (with its own CPUID feature bit). For a long time Intel CPUs ran it as a NOP, but Broadwell and later (IIRC) do actually prefetch a cache line into Exclusive state with a Read-For-Ownership.)


Unless this game only ever ran on AMD hardware, it must have a code path that avoids 3DNow. Fix its CPU detection to stop detecting your CPU as having 3DNow. (Maybe you have a recent AMD, and it assumes any AMD has 3DNow?)

(update on that: OP's comments say that the other code paths don't work for some reason. That's a problem.)


Returning from an exception handler probably restores registers from saved state, so it's not surprising that changing register values in the exception handler has no effect on the main program.

Apparently updating ExtendedRegisters in memory doesn't do the trick, though, so that's only a copy of the saved state.

The answer to modifying MMX registers from an exception handler is probably the same as for integer or XMM registers, so look up MS's documentation for that.


Alternative suggestion:

Rewrite the 3DNow code to use SSE2. (You said there's only a tiny amount of it?). SSE2 is baseline for x86-64, and generally safe to assume for 32-bit x86.

Without source, you could still modify the asm for the few functions that use 3DNow. You can literally just change the instructions to use 64-bit loads/stores into XMM registers instead of 3DNow! 64-bit loads/stores, and replace PFMUL with mulps, etc. (This could get slightly hairy if you run out of registers and the 3DNow code used a memory source operand. addps xmm0, [mem] requires 16B-aligned memory, and does a 16 byte load. So you may have to add a spill/reload to borrow another register as a temporary).

If you don't have room to rewrite the functions in-place, put in a jmp to somewhere you do have room to add new code.

Most of the 3DNow instructions have equivalents in SSE, but you may need some extra movaps instructions to copy registers around to implement PFCMPGE. If you can ignore the possibility of NaN, you can use cmpps with a not-less-than predicate. (Without AVX, SSE only has compare predicates based on less-than or not-less-than).

PFSUBR is easy to emulate with a spare register, just copy and subps to reverse. (Or SUBPS and invert the sign with XORPS). PFRCPIT1 (reciprocal-sqrt first iteration of refinement) and so on don't have a single-instruction implementation, but you can probably just use sqrtps and divps if you don't want to implement Newton-Raphson iterations with mulps and addps (or with AVX vfmadd). Modern CPUs are much faster than what this game was designed for.


You can load / store a pair of single-precision floats from/to memory into the bottom 64 bits of an XMM register using movsd (the SSE2 double-precision load/store instruction). You can also store a pair with movlps, but still use movsd for loading because it zeros the upper half instead of merging, so it doesn't have a dependency on the old value of the register.

Use movdq2q mm0, xmm0 and movq2dq xmm0, mm0 to move data between XMM and MMX.

Use movaps xmm1, xmm0 to copy registers, even if your data is only in the low half. (movsd xmm1, xmm0 merges the low half into the original high half. movq xmm1, xmm0 zeros the high half.)

addps and mulps work fine with zeros in the upper half. (They can slow down if any garbage (in the upper half) produces a denormal result, so prefer keeping the upper half zeroed). See http://felixcloutier.com/x86/ for an instruction-set reference (and other links in the x86 tag wiki.

Any shuffling of FP data can be done in XMM registers with shufps or pshufd instead of copying back to MMX registers to use whatever MMX shuffles.

like image 184
Peter Cordes Avatar answered Sep 18 '22 18:09

Peter Cordes