On modern processors, float division is a good order of magnitude slower than float multiplication (when measured by reciprocal throughput).
I'm wondering if there are any algorithms out there for computating a fast approximation to x/y
, given certain assumptions and tolerance levels. For example, if you assume that 0<x<y
, and are willing to accept any output that is within 10% of the true value, are there algorithms faster than the built-in FDIV operation?
I hope that this helps because this is probably as close as your going to get to what you are looking for.
__inline__ double __attribute__((const)) divide( double y, double x ) {
// calculates y/x
union {
double dbl;
unsigned long long ull;
} u;
u.dbl = x; // x = x
u.ull = ( 0xbfcdd6a18f6a6f52ULL - u.ull ) >> (unsigned char)1;
// pow( x, -0.5 )
u.dbl *= u.dbl; // pow( pow(x,-0.5), 2 ) = pow( x, -1 ) = 1.0/x
return u.dbl * y; // (1.0/x) * y = y/x
}
See also:
Another post about reciprocal approximation.
The Wikipedia page.
FDIV is usually exceptionally slower than FMUL just b/c it can't be piped like multiplication and requires multiple clk cycles for iterative convergence HW seeking process.
Easiest way is to simply recognize that division is nothing more than the multiplication of the dividend y
and the inverse of the divisor x
. The not so straight forward part is remembering a float value x = m * 2 ^ e
& its inverse x^-1 = (1/m)*2^(-e) = (2/m)*2^(-e-1) = p * 2^q
approximating this new mantissa p = 2/m = 3-x, for 1<=m<2
. This gives a rough piece-wise linear approximation of the inverse function, however we can do a lot better by using an iterative Newton Root Finding Method to improve that approximation.
let w = f(x) = 1/x
, the inverse of this function f(x)
is found by solving for x
in terms of w
or x = f^(-1)(w) = 1/w
. To improve the output with the root finding method we must first create a function whose zero reflects the desired output, i.e. g(w) = 1/w - x, d/dw(g(w)) = -1/w^2
.
w[n+1]= w[n] - g(w[n])/g'(w[n]) = w[n] + w[n]^2 * (1/w[n] - x) = w[n] * (2 - x*w[n])
w[n+1] = w[n] * (2 - x*w[n]), when w[n]=1/x, w[n+1]=1/x*(2-x*1/x)=1/x
These components then add to get the final piece of code:
float inv_fast(float x) {
union { float f; int i; } v;
float w, sx;
int m;
sx = (x < 0) ? -1:1;
x = sx * x;
v.i = (int)(0x7EF127EA - *(uint32_t *)&x);
w = x * v.f;
// Efficient Iterative Approximation Improvement in horner polynomial form.
v.f = v.f * (2 - w); // Single iteration, Err = -3.36e-3 * 2^(-flr(log2(x)))
// v.f = v.f * ( 4 + w * (-6 + w * (4 - w))); // Second iteration, Err = -1.13e-5 * 2^(-flr(log2(x)))
// v.f = v.f * (8 + w * (-28 + w * (56 + w * (-70 + w *(56 + w * (-28 + w * (8 - w))))))); // Third Iteration, Err = +-6.8e-8 * 2^(-flr(log2(x)))
return v.f * sx;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With