This is just to satisfy my own curiosity.
Is there an implementation of this:
float InvSqrt (float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i>>1);
x = *(float*)&i;
x = x*(1.5f - xhalf*x*x);
return x;
}
in Rust? If it exists, post the code.
I tried it and failed. I don't know how to encode the float number using integer format. Here is my attempt:
fn main() {
println!("Hello, world!");
println!("sqrt1: {}, ",sqrt2(100f64));
}
fn sqrt1(x: f64) -> f64 {
x.sqrt()
}
fn sqrt2(x: f64) -> f64 {
let mut x = x;
let xhalf = 0.5*x;
let mut i = x as i64;
println!("sqrt1: {}, ", i);
i = 0x5f375a86 as i64 - (i>>1);
x = i as f64;
x = x*(1.5f64 - xhalf*x*x);
1.0/x
}
Reference:
1. Origin of Quake3's Fast InvSqrt() - Page 1
2. Understanding Quake’s Fast Inverse Square Root
3. FAST INVERSE SQUARE ROOT.pdf
4. source code: q_math.c#L552-L572
Many people have also asked if it's still useful on modern processors. The most common answer is no, that this hack was useful 20+ years ago and that modern-day processors have better and faster ways to approximate the inverse square root.
John Carmack has a special function in the Quake III source code which calculates the inverse square root of a float, 4x faster than regular (float)(1.0/sqrt(x)) , including a strange 0x5f3759df constant.
I don't know how to encode the float number using integer format.
There is a function for that: f32::to_bits
which returns an u32
. There is also the function for the other direction: f32::from_bits
which takes an u32
as argument. These functions are preferred over mem::transmute
as the latter is unsafe
and tricky to use.
With that, here is the implementation of InvSqrt
:
fn inv_sqrt(x: f32) -> f32 {
let i = x.to_bits();
let i = 0x5f3759df - (i >> 1);
let y = f32::from_bits(i);
y * (1.5 - 0.5 * x * y * y)
}
(Playground)
This function compiles to the following assembly on x86-64:
.LCPI0_0:
.long 3204448256 ; f32 -0.5
.LCPI0_1:
.long 1069547520 ; f32 1.5
example::inv_sqrt:
movd eax, xmm0
shr eax ; i << 1
mov ecx, 1597463007 ; 0x5f3759df
sub ecx, eax ; 0x5f3759df - ...
movd xmm1, ecx
mulss xmm0, dword ptr [rip + .LCPI0_0] ; x *= 0.5
mulss xmm0, xmm1 ; x *= y
mulss xmm0, xmm1 ; x *= y
addss xmm0, dword ptr [rip + .LCPI0_1] ; x += 1.5
mulss xmm0, xmm1 ; x *= y
ret
I have not found any reference assembly (if you have, please tell me!), but it seems fairly good to me. I am just not sure why the float was moved into eax
just to do the shift and integer subtraction. Maybe SSE registers do not support those operations?
clang 9.0 with -O3
compiles the C code to basically the same assembly. So that's a good sign.
It is worth pointing out that if you actually want to use this in practice: please don't. As benrg pointed out in the comments, modern x86 CPUs have a specialized instruction for this function which is faster and more accurate than this hack. Unfortunately, 1.0 / x.sqrt()
does not seem to optimize to that instruction. So if you really need the speed, using the _mm_rsqrt_ps
intrinsics is probably the way to go. This, however, does again require unsafe
code. I won't go into much detail in this answer, as a minority of programmers will actually need it.
This one is implemented with less known union
in Rust:
union FI {
f: f32,
i: i32,
}
fn inv_sqrt(x: f32) -> f32 {
let mut u = FI { f: x };
unsafe {
u.i = 0x5f3759df - (u.i >> 1);
u.f * (1.5 - 0.5 * x * u.f * u.f)
}
}
Did some micro benchmarks using criterion
crate on a x86-64 Linux box. Surprisingly Rust's own sqrt().recip()
is the fastest. But of course, any micro benchmark result should be taken with a grain of salt.
inv sqrt with transmute time: [1.6605 ns 1.6638 ns 1.6679 ns]
inv sqrt with union time: [1.6543 ns 1.6583 ns 1.6633 ns]
inv sqrt with to and from bits
time: [1.7659 ns 1.7677 ns 1.7697 ns]
inv sqrt with powf time: [7.1037 ns 7.1125 ns 7.1223 ns]
inv sqrt with sqrt then recip
time: [1.5466 ns 1.5488 ns 1.5513 ns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With