I'm trying to revive an old Win32 game that uses 3DNow! instruction set to make 3D rendering.
On modern OSs like Win7 - Win10 instructions like FPADD or FPMUL are not allowed and the program throws an exception.
Since the number of 3DNow! instuctions used by the game is very limited, in my VS2008 MFC program I tried to use vectored exception handling to get the value of MMX registers, emulate the 3DNow! instructions by C code and push the values back to the processor 3DNow! registers.
So far I succeeded in first two steps (I get mmx register values from ExceptionInfo->ExtendedRegisters
byte array at offset 32 and use float type C instructions to make calculations), but my problem is that, no matter how I try to update the MMX register values the register values seem to stay unchanged.
Assuming that my _asm
statements might be wrong, I did also some minimal test using simple statements like this:
_asm movq mm0 mm7
This statement is executed without further exceptions, but when retrieving the MMX register values I still find that the original values were unchanged.
How can I make the assignment effective?
On modern OSs like Win7 - Win10 instructions like FPADD or FPMUL are not allowed
More likely your CPU doesn't support 3DNow! AMD dropped it for Bulldozer-family, and Intel never supported it. So unless you're running modern Windows on an Athlon64 / Phenom (or a Via C3), your CPU doesn't support it.
(Fun fact: PREFETCHW
was originally a 3DNow! instruction, and is still supported (with its own CPUID feature bit). For a long time Intel CPUs ran it as a NOP, but Broadwell and later (IIRC) do actually prefetch a cache line into Exclusive state with a Read-For-Ownership.)
Unless this game only ever ran on AMD hardware, it must have a code path that avoids 3DNow. Fix its CPU detection to stop detecting your CPU as having 3DNow. (Maybe you have a recent AMD, and it assumes any AMD has 3DNow?)
(update on that: OP's comments say that the other code paths don't work for some reason. That's a problem.)
Returning from an exception handler probably restores registers from saved state, so it's not surprising that changing register values in the exception handler has no effect on the main program.
Apparently updating ExtendedRegisters
in memory doesn't do the trick, though, so that's only a copy of the saved state.
The answer to modifying MMX registers from an exception handler is probably the same as for integer or XMM registers, so look up MS's documentation for that.
Alternative suggestion:
Rewrite the 3DNow code to use SSE2. (You said there's only a tiny amount of it?). SSE2 is baseline for x86-64, and generally safe to assume for 32-bit x86.
Without source, you could still modify the asm for the few functions that use 3DNow. You can literally just change the instructions to use 64-bit loads/stores into XMM registers instead of 3DNow! 64-bit loads/stores, and replace PFMUL with mulps
, etc. (This could get slightly hairy if you run out of registers and the 3DNow code used a memory source operand. addps xmm0, [mem]
requires 16B-aligned memory, and does a 16 byte load. So you may have to add a spill/reload to borrow another register as a temporary).
If you don't have room to rewrite the functions in-place, put in a jmp
to somewhere you do have room to add new code.
Most of the 3DNow instructions have equivalents in SSE, but you may need some extra movaps
instructions to copy registers around to implement PFCMPGE
. If you can ignore the possibility of NaN, you can use cmpps
with a not-less-than predicate. (Without AVX, SSE only has compare predicates based on less-than or not-less-than).
PFSUBR
is easy to emulate with a spare register, just copy and subps
to reverse. (Or SUBPS and invert the sign with XORPS). PFRCPIT1
(reciprocal-sqrt first iteration of refinement) and so on don't have a single-instruction implementation, but you can probably just use sqrtps
and divps
if you don't want to implement Newton-Raphson iterations with mulps and addps (or with AVX vfmadd
). Modern CPUs are much faster than what this game was designed for.
You can load / store a pair of single-precision floats from/to memory into the bottom 64 bits of an XMM register using movsd
(the SSE2 double-precision load/store instruction). You can also store a pair with movlps
, but still use movsd
for loading because it zeros the upper half instead of merging, so it doesn't have a dependency on the old value of the register.
Use movdq2q mm0, xmm0
and movq2dq xmm0, mm0
to move data between XMM and MMX.
Use movaps xmm1, xmm0
to copy registers, even if your data is only in the low half. (movsd xmm1, xmm0
merges the low half into the original high half. movq xmm1, xmm0
zeros the high half.)
addps
and mulps
work fine with zeros in the upper half. (They can slow down if any garbage (in the upper half) produces a denormal result, so prefer keeping the upper half zeroed). See http://felixcloutier.com/x86/ for an instruction-set reference (and other links in the x86 tag wiki.
Any shuffling of FP data can be done in XMM registers with shufps
or pshufd
instead of copying back to MMX registers to use whatever MMX shuffles.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With