[Short answer: Bad benchmarking methodology. You'd think I'd have figured that out by now.]
The problem is presented as "find a method for calculating x^y quickly, where x and y are positive integers". A typical "fast" algorithm looks like this:
public long fastPower(int x, int y) {
// Replaced my code with the "better" version described below,
// but this version isn't measurably faster than what I had before
long base = x; // otherwise, we may overflow at x *= x.
long result = y % 2 == 1 ? x : 1;
while (y > 1) {
base *= base;
y >>= 1;
if (y % 2 == 1) result *= base;
}
return result;
}
I wanted to see how much faster this is than say, calling Math.pow(), or using a naive approach like multiplying x by itself y times, like this:
public long naivePower(int x, int y) {
long result = 1;
for (int i = 0; i < y; i++) {
result *= x;
}
return result;
}
Edit: Okay, it's been pointed out to me (correctly) that my benchmarking code was not consuming the result, and that totally throws everything off. Once I started consuming the result, I'm still seeing the naive approach being about 25% faster than the "fast" approach.
Original text:
I was very surprised to find that the naive approach was 4x faster than the "fast" version, which was itself about 3x faster than the Math.pow() version.
My test is using 10,000,000 trials (and then 100 million, just to make absolutely sure the JIT has time to warm up), each using random values (to prevent calls from being optimized away) 2 <= x <= 3, and 25 <= y <= 29. I chose a narrow range of values that wouldn't produce a result greater than 2^63, but were biased with a larger exponent to attempt to give the "fast" version an advantage. I'm pre-generating 10,000 pseudorandom numbers to eliminate that part of the code from the timing.
I understand that for small exponents, the naive version could be expected to be faster. The "fast" version has two branches instead of one, and will generally do twice as many arithmetic/storage operations as the naive one - but I would expect for large exponents, that still would lead to the fast approach saving half as many operations in the best case, and being about the same in the worst case.
Anyone have any idea why the naive approach would be so much faster than the "fast" version, even with data biased toward the "fast" version (i.e. larger exponents)? Does the extra branch in that code account for that much of a difference at runtime?
Benchmarking code (yes I know I should use some framework for "official" benchmarks, but this is a toy problem) - updated to warm up, and consume results:
PowerIf[] powers = new PowerIf[] {
new EasyPower(), // just calls Math.pow() and cast to int
new NaivePower(),
new FastPower()
};
Random rand = new Random(0); // same seed for each run
int randCount = 10000;
int[] bases = new int[randCount];
int[] exponents = new int[randCount];
for (int i = 0; i < randCount; i++) {
bases[i] = 2 + rand.nextInt(2);
exponents[i] = 25 + rand.nextInt(5);
}
int count = 1000000000;
for (int trial = 0; trial < powers.length; trial++) {
long total = 0;
for (int i = 0; i < count; i++) { // warm up
final int x = bases[i % randCount];
final int y = exponents[i % randCount];
total += powers[trial].power(x, y);
}
long start = System.currentTimeMillis();
for (int i = 0; i < count; i++) {
final int x = bases[i % randCount];
final int y = exponents[i % randCount];
total += powers[trial].power(x, y);
}
long end = System.currentTimeMillis();
System.out.printf("%25s: %d ms%n", powers[trial].toString(), (end - start));
System.out.println(total);
}
Produces output:
EasyPower: 7908 ms -407261252961037760 NaivePower: 1993 ms -407261252961037760 FastPower: 2394 ms -407261252961037760
Playing with the parameters for the random numbers and the trials does change the output characteristics, but the ratios between the tests are always consistently the same as those shown.
Answer: 2 to the power of 31 can be expressed as 231 = 2 × 2 × 2 × … 31 times = 214748e49. Let us proceed step by step to write 2 to the power of 31.
There are two issues with your fastPower
:
y % 2 == 0
with (y & 1) == 0
; bitwise operations are faster.y
and performs extra multiplication, including the cases when y
is even. It's better to put this part into else
clause.Anyway, I guess that your benchmarking method is not perfect. 4x performance difference sounds weird and cannot be explained without seeing the complete code.
After applying above improvements I've verified using JMH
benchmark that fastPower
is indeed faster than naivePower
with the factor of 1.3x to 2x.
package bench;
import org.openjdk.jmh.annotations.*;
@State(Scope.Benchmark)
public class FastPow {
@Param("3")
int x;
@Param({"25", "28", "31", "32"})
int y;
@Benchmark
public long fast() {
return fastPower(x, y);
}
@Benchmark
public long naive() {
return naivePower(x, y);
}
public static long fastPower(long x, int y) {
long result = 1;
while (y > 0) {
if ((y & 1) == 0) {
x *= x;
y >>>= 1;
} else {
result *= x;
y--;
}
}
return result;
}
public static long naivePower(long x, int y) {
long result = 1;
for (int i = 0; i < y; i++) {
result *= x;
}
return result;
}
}
Results:
Benchmark (x) (y) Mode Cnt Score Error Units
FastPow.fast 3 25 thrpt 10 103,406 ± 0,664 ops/us
FastPow.fast 3 28 thrpt 10 103,520 ± 0,351 ops/us
FastPow.fast 3 31 thrpt 10 85,390 ± 0,286 ops/us
FastPow.fast 3 32 thrpt 10 115,868 ± 0,294 ops/us
FastPow.naive 3 25 thrpt 10 76,331 ± 0,660 ops/us
FastPow.naive 3 28 thrpt 10 69,527 ± 0,464 ops/us
FastPow.naive 3 31 thrpt 10 54,407 ± 0,231 ops/us
FastPow.naive 3 32 thrpt 10 56,127 ± 0,207 ops/us
Note: Integer multiplication is quite fast operation, sometimes even faster than an extra comparison. Don't expect huge performance improvements with the values that fit into long
. The advantage of fast power algorithm will be evident on BigInteger
with larger exponents.
Since the author posted the benchmark, I must admit that surprising performance results come from the common benchmarking pitfalls.
I've improved the benchmark while retaining the original methodology, and now it shows that FastPower
is indeed faster than NaivePower
, see here.
What are the key changes in the improved version?
y % 2
is replaced with y & 1
since HotSpot does not perform this optimization automatically.Writing microbenchmarks manually is a difficult tasks. That's why it is strongly recommended to use proper benchmarking frameworks like JMH.
Without being able to review and replicate your benchmark there's little point in trying to break down your results. They could be due to a poor selection of inputs, erroneous benchmarking practices such as running one test before another (thus giving the JVM time to "warm up"), and so on. Please share your benchmarking code, not just your results.
I would suggest including in your tests Guava's LongMath.pow()
(src), that is a heavily used and well benchmarked method. While you might be able to beat it with certain inputs it's unlikely you can improve on its runtime in the general case (and if you can, they'd love to hear about it).
It is unsurprising that Math.pow()
would perform worse than positive-integer-only algorithms. Looking at the "fast" vs. "naive" implementations it's clearly very much dependent on the inputs you choose as Mike 'Pomax' Kamermans suggests. For small values of y
the "naive" solution obviously has to do less work. But for larger values we save a good number of iterations with the "fast" implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With