Is there a fast C or C++ standard library function for double precision inverse square root?

Question

I find myself typing

double foo=1.0/sqrt(...);

a lot, and I've heard that modern processors have built-in inverse square root opcodes.

Is there a C or C++ standard library inverse square root function that

uses double precision floating point?
is as accurate as 1.0/sqrt(...)?
is just as fast or faster than the result of 1.0/sqrt(...)?

Lightness Races in Orbit · Accepted Answer

No. No, there isn't. Not in C++. Nope.

Bram · Answer

I don't know of a standardized C API for this, but that does not mean you cannot use the fast inverse sqrt instructions, as long as you are willing to write platform dependent intrinsics.

Let's take 64-bit x86 with AVX for example, where you can use _mm256_rsqrt_ps() to approximate the reciprocal of a square root. Or more specifically: 8 square-roots in a single go, using SIMD.

#include <immintrin.h>

...

float inputs[8] = { ... } __attribute__ ((aligned (32)));
__m256 input = _mm256_load_ps(inputs);
__m256 invroot = _mm256_rsqrt_ps(input);

Similarly, you can use the intrinsic vrsqrteq_f32 on ARM with NEON. In this case, the SIMD is 4-wide, so it will compute four inverse square roots in a single go.

#include <arm_neon.h>

...

float32x4_t sqrt_reciprocal = vrsqrteq_f32(x);

Even if you need just one root value per batch, it is still faster than a full square root. Just set the input in all, or one lane of the SIMD register. That way, you will not have to go through your memory with a load operation. On x86 that is done via _mm256_set1_ps(x).

Is there a fast C or C++ standard library function for double precision inverse square root?

Tags:

c++

c

double

sqrt

Dan

2 Answers

Lightness Races in Orbit

Bram

Recent Activity

Donate For Us

Is there a fast C or C++ standard library function for double precision inverse square root?

Tags:

c++

c

double

sqrt

Dan

2 Answers

Lightness Races in Orbit

Bram

Related questions

Recent Activity

Donate For Us