Fast inverse square root on the iPhone

Question

The fast inverse square function used by SGI/3dfx and most notably in Quake is often cited as being faster than the assembly instruction equivalent, however the posts claiming that seem quite dated. I was curious about its performance on more modern hardware, and particularly on mobile devices like the iPhone. I wouldn't be surprised if the Quake sqrt is not longer a worthwhile optimization on desktop systems, but how about for an iPhone project involving a lot of 3D math? Is it something that would be worthwhile to include?

Stephen Canon · Accepted Answer

No.

The NEON instruction set (like every other vector ISA*) has a hardware approximate reciprocal square root instruction that is much faster than that oft-cited "trick". Use it instead if reciprocal square root is actually a performance bottleneck in your code (as always, benchmark first; don't spend time optimizing something if you have no hard evidence that its performance matters).

You can get at it by writing your own assembly (inline or otherwise) with the vrsqrte.f32 instruction, or from C, Objective-C, or C++ by including the <arm_neon.h> header and using the vrsqrte_f32( ) intrinsic.

[*] On SSE it's rsqrtss/rsqrtps; on Altivec it's frsqrte/vrsqrte.

Fast inverse square root on the iPhone

Tags:

performance

optimization

floating-point

iphone

mathematical-optimization

TaylorP

1 Answers

Stephen Canon

Recent Activity

Donate For Us

Fast inverse square root on the iPhone

Tags:

performance

optimization

floating-point

iphone

mathematical-optimization

TaylorP

1 Answers

Stephen Canon

Related questions

Recent Activity

Donate For Us