I'm working on an iPhone app that involves certain physics calculations that are done thousands of times per second. I am working on optimizing the code to improve the framerate. One of the pieces that I am looking at improving is the inverse square root. Right now, I am using the Quake 3 fast inverse square root method. After doing some research, however, I heard that there is a faster way by using the NEON instruction set. I am unfamiliar with inline assembly and cannot figure out how to use NEON. I tried implementing the math-neon library but I get compiler errors because most of the NEON-based functions lack return
.
EDIT: I've suddenly been getting some "unclear question" close votes. Although I think its quite clear and those who answered obviously understood, maybe some people need it stated explicitly: How do you use Neon to perform faster calculations? And is it really the fastest method for getting the inverse square root on the iPhone?
EDIT: I did some more formal testing on Neon VS Quake today, but If anything, I'm even more uncertain about the outcome now:
In-App Testing: (An app that is currently in the app store with its invsqrt method modified)
"Formal" Testing (An app that devours my Phone's CPU. Times how long it takes each method to get through an array of 10000000 randomly generated floats)
While quake vs neon was too close to say anything for sure in the app performance test, the quake vs 1/sqrtf() was quite clearly cut out in the first test, and the second test was extremely consistent with the values it outputted. What is important in the end, though, is app performance, so I'm going to make my final decision based on that test.
As shown below, SSE_InvSqrt function is the fastest algorithm to compute 1 / sqrt(x) with a reasonable precision. However, the standard sqrt function can provide more or less the same performance in a SISD architecture, but definitely with a better portability and maintainability.
A single Newton-Raphson iteration is performed to calculate a more accurate approximation of the inverse square root of the input. The result of the Newton-Raphson iteration is the return value of the function. The result is extremely accurate with a maximum error of 0.175%.
The algorithm is not copyrighted, but the source code of the function is copyrighted. You could learn how the algorithm works by reading the function, and then write your own function that implements the same algorithm.
The accepted answer of the question you've linked already provides the answer, but doesn't spell it out:
#import <arm_neon.h>
void foo() {
float32x2_t inverseSqrt = vrsqrte_f32(someFloat);
}
Header and function are already provided by the iOS SDK.
https://code.google.com/p/math-neon/source/browse/trunk/math_sqrtf.c <- there's a neon implementation of invsqrt there, you should be able to copy the assembly bit as-is
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With