For the function I am writing I would like to return a Nan if the input doesn't make sense.
How can I insert a NaN into an xmm register the easiest way?
All-ones is a quiet (non-signalling, aka normal) NaN, which is what you want. The easiest way to produce one is with SSE2 pcmpeqd xmm0,xmm0
to set every bit in the register to 1
, i.e. 2's complement integer -1
. (Set all bits in CPU register to 1 efficiently / What are the best instruction sequences to generate vector constants on the fly?)
It's actually a -NaN
- the sign bit is set. Consider integer right shift (psrld xmm0,1
) or divide by zero / zero (xorps xmm0,xmm0
/ divpd xmm0,xmm0
) if that's undesirable.
Math functions that want to return NaN often also want to make sure the FP-invalid sticky exception bit gets set in MXCSR (or actually raise an exception if your caller unmasked that exception). To do that, you can multiply or add the NaN with itself. e.g.
...
.error_return_path:
pcmpeqd xmm0, xmm0
mulsd xmm0, xmm0 ; Cause an FP-invalid operation.
ret
Or mulss
for single-precision float
. mulpd
/ mulps
would also be appropriate.
The bit-pattern for multiply or add of NaN with NaN is definitely still a NaN, and should still be the same payload, so still all-ones.
Having the return value be a result of mulsd
or addsd
(or divsd
) also has the advantage that if the caller uses that register repeatedly in a loop, it won't have domain-crossing bypass latency. (On Sandybridge-family, this lasts forever. e.g. every addsd xmm1, xmm0
would have an extra cycle of latency from xmm1 input to xmm1 output if xmm0 came from pcmpeqd
, even if that was long ago and the integer-SIMD uop has already retired.)
You might even be able to do it branchlessly if you use cmpsd
or cmppd
: you can orps
that 0 / -1 mask into a result to make it NaN or unchanged. If some other calculation will (or will have already) set the FP-invalid flag, or if you don't care about that, you're all set.
Beware of lengthening the critical path with extra cmp / or; if you expect it's super rare, you might rather still compare and branch, e.g. with movmskpd
/ test eax,eax
/ jnz
on a cmppd result to see if either bit was set => one of the SIMD elements failed some check.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With