Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can you insert a NaN into a xmm register?

For the function I am writing I would like to return a Nan if the input doesn't make sense.

How can I insert a NaN into an xmm register the easiest way?

like image 243
Markus Avatar asked Jan 21 '20 09:01

Markus


1 Answers

All-ones is a quiet (non-signalling, aka normal) NaN, which is what you want. The easiest way to produce one is with SSE2 pcmpeqd xmm0,xmm0 to set every bit in the register to 1, i.e. 2's complement integer -1. (Set all bits in CPU register to 1 efficiently / What are the best instruction sequences to generate vector constants on the fly?)

It's actually a -NaN - the sign bit is set. Consider integer right shift (psrld xmm0,1) or divide by zero / zero (xorps xmm0,xmm0 / divpd xmm0,xmm0) if that's undesirable.


Math functions that want to return NaN often also want to make sure the FP-invalid sticky exception bit gets set in MXCSR (or actually raise an exception if your caller unmasked that exception). To do that, you can multiply or add the NaN with itself. e.g.

    ...
.error_return_path:
    pcmpeqd   xmm0, xmm0
    mulsd     xmm0, xmm0       ; Cause an FP-invalid operation.
    ret

Or mulss for single-precision float. mulpd / mulps would also be appropriate.

The bit-pattern for multiply or add of NaN with NaN is definitely still a NaN, and should still be the same payload, so still all-ones.

Having the return value be a result of mulsd or addsd (or divsd) also has the advantage that if the caller uses that register repeatedly in a loop, it won't have domain-crossing bypass latency. (On Sandybridge-family, this lasts forever. e.g. every addsd xmm1, xmm0 would have an extra cycle of latency from xmm1 input to xmm1 output if xmm0 came from pcmpeqd, even if that was long ago and the integer-SIMD uop has already retired.)


You might even be able to do it branchlessly if you use cmpsd or cmppd: you can orps that 0 / -1 mask into a result to make it NaN or unchanged. If some other calculation will (or will have already) set the FP-invalid flag, or if you don't care about that, you're all set.

Beware of lengthening the critical path with extra cmp / or; if you expect it's super rare, you might rather still compare and branch, e.g. with movmskpd / test eax,eax / jnz on a cmppd result to see if either bit was set => one of the SIMD elements failed some check.

like image 192
Peter Cordes Avatar answered Nov 17 '22 17:11

Peter Cordes