Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is fast inverse square root so odd and slow on Java?

I'm trying to implement Fast Inverse Square Root on java in order to speed up vector normalization. However, when I implement the single-precision version in Java, I get speeds about the same as 1F / (float)Math.sqrt() at first, then quickly drops to half the speed. This is interesting, because while Math.sqrt uses (I presume) a native method, this involves floating point division, which I've heard is really slow. My code for computing the numbers is as follows:

public static float fastInverseSquareRoot(float x){
    float xHalf = 0.5F * x;
    int temp = Float.floatToRawIntBits(x);
    temp = 0x5F3759DF - (temp >> 1);
    float newX = Float.intBitsToFloat(temp);
    newX = newX * (1.5F - xHalf * newX * newX);
    return newX;
}

Using a short program I've written to iterate each 16 million times, then aggregate results, and repeat, I get results like this:

1F / Math.sqrt() took 65209490 nanoseconds.
Fast Inverse Square Root took 65456128 nanoseconds.
Fast Inverse Square Root was 0.378224 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 64131293 nanoseconds.
Fast Inverse Square Root took 26214534 nanoseconds.
Fast Inverse Square Root was 59.123647 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 27312205 nanoseconds.
Fast Inverse Square Root took 56234714 nanoseconds.
Fast Inverse Square Root was 105.895914 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 26493281 nanoseconds.
Fast Inverse Square Root took 56004783 nanoseconds.
Fast Inverse Square Root was 111.392402 percent slower than 1F / Math.sqrt()

I consistently get numbers which are about the same speed for both, followed by an iteration where Fast Inverse Square Root saves about 60 percent of the time required by 1F / Math.sqrt(), followed by several iterations which take about twice as long for Fast Inverse Square Root to run as the control. I'm confused why FISR would go from Same -> 60 percent faster -> 100 percent slower, and it happens every time I run my program.

EDIT: The above data is when I run it in eclipse. When I run the program with javac/java I get completely different data:

1F / Math.sqrt() took 57870498 nanoseconds.
Fast Inverse Square Root took 88206794 nanoseconds.
Fast Inverse Square Root was 52.421004 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 54982400 nanoseconds.
Fast Inverse Square Root took 83777562 nanoseconds.
Fast Inverse Square Root was 52.371599 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 21115822 nanoseconds.
Fast Inverse Square Root took 76705152 nanoseconds.
Fast Inverse Square Root was 263.259133 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 20159210 nanoseconds.
Fast Inverse Square Root took 80745616 nanoseconds.
Fast Inverse Square Root was 300.539585 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 21814675 nanoseconds.
Fast Inverse Square Root took 85261648 nanoseconds.
Fast Inverse Square Root was 290.845374 percent slower than 1F / Math.sqrt()

EDIT2: After a few responses, it seems the speed stabilizes after several iterations, but the number it stabilizes to is highly volatile. Anyone have any idea why?

Here's my code (not exactly concise, but here's the whole thing):

public class FastInverseSquareRootTest {

    public static FastInverseSquareRootTest conductTest() {
        float result = 0F;
        long startTime, endTime, midTime;
        startTime = System.nanoTime();
        for (float x = 1F; x < 4_000_000F; x += 0.25F) {
            result = 1F / (float) Math.sqrt(x);
        }
        midTime = System.nanoTime();
        for (float x = 1F; x < 4_000_000F; x += 0.25F) {
            result = fastInverseSquareRoot(x);
        }
        endTime = System.nanoTime();
        return new FastInverseSquareRootTest(midTime - startTime, endTime
                - midTime);
    }

    public static float fastInverseSquareRoot(float x) {
        float xHalf = 0.5F * x;
        int temp = Float.floatToRawIntBits(x);
        temp = 0x5F3759DF - (temp >> 1);
        float newX = Float.intBitsToFloat(temp);
        newX = newX * (1.5F - xHalf * newX * newX);
        return newX;
    }

    public static void main(String[] args) throws Exception {
        for (int i = 0; i < 7; i++) {
            System.out.println(conductTest().toString());
        }
    }

    private long controlDiff;

    private long experimentalDiff;

    private double percentError;

    public FastInverseSquareRootTest(long controlDiff, long experimentalDiff) {
        this.experimentalDiff = experimentalDiff;
        this.controlDiff = controlDiff;
        this.percentError = 100D * (experimentalDiff - controlDiff)
                / controlDiff;
    }

    @Override
    public String toString() {
        StringBuilder sb = new StringBuilder();
        sb.append(String.format("1F / Math.sqrt() took %d nanoseconds.%n",
                controlDiff));
        sb.append(String.format(
                "Fast Inverse Square Root took %d nanoseconds.%n",
                experimentalDiff));
        sb.append(String
                .format("Fast Inverse Square Root was %f percent %s than 1F / Math.sqrt()%n",
                        Math.abs(percentError), percentError > 0D ? "slower"
                                : "faster"));
        return sb.toString();
    }
}
like image 970
Leo Izen Avatar asked May 14 '13 19:05

Leo Izen


1 Answers

The JIT optimiser seems to have thrown the call to Math.sqrt away.

With your unmodified code, I got

1F / Math.sqrt() took 65358495 nanoseconds.
Fast Inverse Square Root took 77152791 nanoseconds.
Fast Inverse Square Root was 18,045544 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 52872498 nanoseconds.
Fast Inverse Square Root took 75242075 nanoseconds.
Fast Inverse Square Root was 42,308531 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 23386359 nanoseconds.
Fast Inverse Square Root took 73532080 nanoseconds.
Fast Inverse Square Root was 214,422951 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 23790209 nanoseconds.
Fast Inverse Square Root took 76254902 nanoseconds.
Fast Inverse Square Root was 220,530610 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 23885467 nanoseconds.
Fast Inverse Square Root took 74869636 nanoseconds.
Fast Inverse Square Root was 213,452678 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 23473514 nanoseconds.
Fast Inverse Square Root took 73063699 nanoseconds.
Fast Inverse Square Root was 211,260168 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 23738564 nanoseconds.
Fast Inverse Square Root took 71917013 nanoseconds.
Fast Inverse Square Root was 202,954353 percent slower than 1F / Math.sqrt()

consistently slower times for fastInverseSquareRoot, and the times for that are all in the same ball-park, while the Math.sqrt calls sped up considerably.

Changing the code so that the calls to Math.sqrt couldn't be avoided,

    for (float x = 1F; x < 4_000_000F; x += 0.25F) {
        result += 1F / (float) Math.sqrt(x);
    }
    midTime = System.nanoTime();
    for (float x = 1F; x < 4_000_000F; x += 0.25F) {
        result -= fastInverseSquareRoot(x);
    }
    endTime = System.nanoTime();
    if (result == 0) System.out.println("Wow!");

I got

1F / Math.sqrt() took 184884684 nanoseconds.
Fast Inverse Square Root took 85298761 nanoseconds.
Fast Inverse Square Root was 53,863804 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 182183542 nanoseconds.
Fast Inverse Square Root took 83040574 nanoseconds.
Fast Inverse Square Root was 54,419278 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 165269658 nanoseconds.
Fast Inverse Square Root took 81922280 nanoseconds.
Fast Inverse Square Root was 50,431143 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 163272877 nanoseconds.
Fast Inverse Square Root took 81906141 nanoseconds.
Fast Inverse Square Root was 49,834815 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 165314846 nanoseconds.
Fast Inverse Square Root took 81124465 nanoseconds.
Fast Inverse Square Root was 50,927296 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 164079534 nanoseconds.
Fast Inverse Square Root took 80453629 nanoseconds.
Fast Inverse Square Root was 50,966689 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 162350821 nanoseconds.
Fast Inverse Square Root took 79854355 nanoseconds.
Fast Inverse Square Root was 50,813704 percent faster than 1F / Math.sqrt()

much slower times for Math.sqrt, and only moderately slower times for fastInverseSqrt (now it had to do a subtraction in each iteration).

like image 91
Daniel Fischer Avatar answered Sep 24 '22 22:09

Daniel Fischer