Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest Inverse Square Root on iPhone

I'm working on an iPhone app that involves certain physics calculations that are done thousands of times per second. I am working on optimizing the code to improve the framerate. One of the pieces that I am looking at improving is the inverse square root. Right now, I am using the Quake 3 fast inverse square root method. After doing some research, however, I heard that there is a faster way by using the NEON instruction set. I am unfamiliar with inline assembly and cannot figure out how to use NEON. I tried implementing the math-neon library but I get compiler errors because most of the NEON-based functions lack return.

EDIT: I've suddenly been getting some "unclear question" close votes. Although I think its quite clear and those who answered obviously understood, maybe some people need it stated explicitly: How do you use Neon to perform faster calculations? And is it really the fastest method for getting the inverse square root on the iPhone?

EDIT: I did some more formal testing on Neon VS Quake today, but If anything, I'm even more uncertain about the outcome now:

  • In-App Testing: (An app that is currently in the app store with its invsqrt method modified)

    1. Quake Method (leading by a marginal increase in average FPS under stressful conditions)
    2. Neon (It was a really close call but it seemed that Quake was slightly faster)
    3. 1/sqrtf() (a bit more noticeable difference, 1-3 FPS drop).
  • "Formal" Testing (An app that devours my Phone's CPU. Times how long it takes each method to get through an array of 10000000 randomly generated floats)

    1. Neon (clearly the fastest, and double the speed if it is used to do two sqrts at once).
    2. 1/sqrtf() (Only marginally slower than Neon. This surprising result leads me to deem this test "inconclusive" until I investigate further)
    3. Quake (This method, surprisingly, was a few orders of magnitude slower than the other two methods. This is especially surprising given its performance in the other test.)

While quake vs neon was too close to say anything for sure in the app performance test, the quake vs 1/sqrtf() was quite clearly cut out in the first test, and the second test was extremely consistent with the values it outputted. What is important in the end, though, is app performance, so I'm going to make my final decision based on that test.

like image 851
WolfLink Avatar asked Jan 10 '14 07:01

WolfLink


People also ask

Is fast inverse square root still faster?

As shown below, SSE_InvSqrt function is the fastest algorithm to compute 1 / sqrt(x) with a reasonable precision. However, the standard sqrt function can provide more or less the same performance in a SISD architecture, but definitely with a better portability and maintainability.

How accurate is fast inverse square root?

A single Newton-Raphson iteration is performed to calculate a more accurate approximation of the inverse square root of the input. The result of the Newton-Raphson iteration is the return value of the function. The result is extremely accurate with a maximum error of 0.175%.

Is fast inverse square root copyrighted?

The algorithm is not copyrighted, but the source code of the function is copyrighted. You could learn how the algorithm works by reading the function, and then write your own function that implements the same algorithm.


2 Answers

The accepted answer of the question you've linked already provides the answer, but doesn't spell it out:

#import <arm_neon.h>

void foo() {
    float32x2_t inverseSqrt = vrsqrte_f32(someFloat);
}

Header and function are already provided by the iOS SDK.

like image 75
DarkDust Avatar answered Oct 15 '22 00:10

DarkDust


https://code.google.com/p/math-neon/source/browse/trunk/math_sqrtf.c <- there's a neon implementation of invsqrt there, you should be able to copy the assembly bit as-is

like image 38
Fjölnir Avatar answered Oct 15 '22 01:10

Fjölnir