Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Fast" Integer Powers in Java

[Short answer: Bad benchmarking methodology. You'd think I'd have figured that out by now.]

The problem is presented as "find a method for calculating x^y quickly, where x and y are positive integers". A typical "fast" algorithm looks like this:

public long fastPower(int x, int y) {
  // Replaced my code with the "better" version described below,
  // but this version isn't measurably faster than what I had before
  long base = x; // otherwise, we may overflow at x *= x.
  long result = y % 2 == 1 ? x : 1;
  while (y > 1) {
    base *= base;
    y >>= 1;
    if (y % 2 == 1) result *= base;
  }

  return result;
}

I wanted to see how much faster this is than say, calling Math.pow(), or using a naive approach like multiplying x by itself y times, like this:

public long naivePower(int x, int y) {
  long result = 1;
  for (int i = 0; i < y; i++) {
    result *= x;
  }
  return result;
}

Edit: Okay, it's been pointed out to me (correctly) that my benchmarking code was not consuming the result, and that totally throws everything off. Once I started consuming the result, I'm still seeing the naive approach being about 25% faster than the "fast" approach.

Original text:

I was very surprised to find that the naive approach was 4x faster than the "fast" version, which was itself about 3x faster than the Math.pow() version.

My test is using 10,000,000 trials (and then 100 million, just to make absolutely sure the JIT has time to warm up), each using random values (to prevent calls from being optimized away) 2 <= x <= 3, and 25 <= y <= 29. I chose a narrow range of values that wouldn't produce a result greater than 2^63, but were biased with a larger exponent to attempt to give the "fast" version an advantage. I'm pre-generating 10,000 pseudorandom numbers to eliminate that part of the code from the timing.

I understand that for small exponents, the naive version could be expected to be faster. The "fast" version has two branches instead of one, and will generally do twice as many arithmetic/storage operations as the naive one - but I would expect for large exponents, that still would lead to the fast approach saving half as many operations in the best case, and being about the same in the worst case.

Anyone have any idea why the naive approach would be so much faster than the "fast" version, even with data biased toward the "fast" version (i.e. larger exponents)? Does the extra branch in that code account for that much of a difference at runtime?

Benchmarking code (yes I know I should use some framework for "official" benchmarks, but this is a toy problem) - updated to warm up, and consume results:

PowerIf[] powers = new PowerIf[] {
  new EasyPower(), // just calls Math.pow() and cast to int
  new NaivePower(),
  new FastPower()
};

Random rand = new Random(0); // same seed for each run
int randCount = 10000;
int[] bases = new int[randCount];
int[] exponents = new int[randCount];
for (int i = 0; i < randCount; i++) {
  bases[i] = 2 + rand.nextInt(2);
  exponents[i] = 25 + rand.nextInt(5);
}

int count = 1000000000;

for (int trial = 0; trial < powers.length; trial++) {
  long total = 0;
  for (int i = 0; i < count; i++) { // warm up
    final int x = bases[i % randCount];
    final int y = exponents[i % randCount];
    total += powers[trial].power(x, y);
  }
  long start = System.currentTimeMillis();
  for (int i = 0; i < count; i++) {
    final int x = bases[i % randCount];
    final int y = exponents[i % randCount];
    total += powers[trial].power(x, y);
  }
  long end = System.currentTimeMillis();
  System.out.printf("%25s: %d ms%n", powers[trial].toString(), (end - start)); 
  System.out.println(total);
}

Produces output:

                EasyPower: 7908 ms
-407261252961037760
               NaivePower: 1993 ms
-407261252961037760
                FastPower: 2394 ms
-407261252961037760

Playing with the parameters for the random numbers and the trials does change the output characteristics, but the ratios between the tests are always consistently the same as those shown.

like image 509
Brian Avatar asked Feb 27 '16 05:02

Brian


People also ask

What is the value of 2 power 31?

Answer: 2 to the power of 31 can be expressed as 231 = 2 × 2 × 2 × … 31 times = 214748e49. Let us proceed step by step to write 2 to the power of 31.


2 Answers

There are two issues with your fastPower:

  1. It's better to replace y % 2 == 0 with (y & 1) == 0; bitwise operations are faster.
  2. Your code always decrements y and performs extra multiplication, including the cases when y is even. It's better to put this part into else clause.

Anyway, I guess that your benchmarking method is not perfect. 4x performance difference sounds weird and cannot be explained without seeing the complete code.

After applying above improvements I've verified using JMH benchmark that fastPower is indeed faster than naivePower with the factor of 1.3x to 2x.

package bench;

import org.openjdk.jmh.annotations.*;

@State(Scope.Benchmark)
public class FastPow {
    @Param("3")
    int x;
    @Param({"25", "28", "31", "32"})
    int y;

    @Benchmark
    public long fast() {
        return fastPower(x, y);
    }

    @Benchmark
    public long naive() {
        return naivePower(x, y);
    }

    public static long fastPower(long x, int y) {
        long result = 1;
        while (y > 0) {
            if ((y & 1) == 0) {
                x *= x;
                y >>>= 1;
            } else {
                result *= x;
                y--;
            }
        }
        return result;
    }

    public static long naivePower(long x, int y) {
        long result = 1;
        for (int i = 0; i < y; i++) {
            result *= x;
        }
        return result;
    }
}

Results:

Benchmark      (x)  (y)   Mode  Cnt    Score   Error   Units
FastPow.fast     3   25  thrpt   10  103,406 ± 0,664  ops/us
FastPow.fast     3   28  thrpt   10  103,520 ± 0,351  ops/us
FastPow.fast     3   31  thrpt   10   85,390 ± 0,286  ops/us
FastPow.fast     3   32  thrpt   10  115,868 ± 0,294  ops/us
FastPow.naive    3   25  thrpt   10   76,331 ± 0,660  ops/us
FastPow.naive    3   28  thrpt   10   69,527 ± 0,464  ops/us
FastPow.naive    3   31  thrpt   10   54,407 ± 0,231  ops/us
FastPow.naive    3   32  thrpt   10   56,127 ± 0,207  ops/us

Note: Integer multiplication is quite fast operation, sometimes even faster than an extra comparison. Don't expect huge performance improvements with the values that fit into long. The advantage of fast power algorithm will be evident on BigInteger with larger exponents.

Update

Since the author posted the benchmark, I must admit that surprising performance results come from the common benchmarking pitfalls. I've improved the benchmark while retaining the original methodology, and now it shows that FastPower is indeed faster than NaivePower, see here.

What are the key changes in the improved version?

  1. Different algorithms should be tested separately in different JVM instances to prevent profile pollution.
  2. A benchmark must be called several times to allow proper compilation/recompilation until the steady state is reached.
  3. One benchmark trial should be placed in a separate method to avoid on-stack replacement issues.
  4. y % 2 is replaced with y & 1 since HotSpot does not perform this optimization automatically.
  5. Minimized the effect of unrelated operations in the main benchmark loop.

Writing microbenchmarks manually is a difficult tasks. That's why it is strongly recommended to use proper benchmarking frameworks like JMH.

like image 114
apangin Avatar answered Sep 22 '22 12:09

apangin


Without being able to review and replicate your benchmark there's little point in trying to break down your results. They could be due to a poor selection of inputs, erroneous benchmarking practices such as running one test before another (thus giving the JVM time to "warm up"), and so on. Please share your benchmarking code, not just your results.

I would suggest including in your tests Guava's LongMath.pow() (src), that is a heavily used and well benchmarked method. While you might be able to beat it with certain inputs it's unlikely you can improve on its runtime in the general case (and if you can, they'd love to hear about it).

It is unsurprising that Math.pow() would perform worse than positive-integer-only algorithms. Looking at the "fast" vs. "naive" implementations it's clearly very much dependent on the inputs you choose as Mike 'Pomax' Kamermans suggests. For small values of y the "naive" solution obviously has to do less work. But for larger values we save a good number of iterations with the "fast" implementation.

like image 41
dimo414 Avatar answered Sep 19 '22 12:09

dimo414