Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is there a difference between LongStream reduce and sum performance?

I was using LongStream's rangeClosed to test the performance of the sum of the numbers. When I tested the performance through JMH, the result was as below.

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(value = 1, jvmArgs = {"-Xms4G", "-Xmx4G"})
@State(Scope.Benchmark)
@Warmup(iterations = 10, time = 10)
@Measurement(iterations = 10, time = 10)
public class ParallelStreamBenchmark {
  private static final long N = 10000000L;

  @Benchmark
  public long sequentialSum() {
    return Stream.iterate(1L, i -> i + 1).limit(N).reduce(0L, Long::sum);
  }

  @Benchmark
  public long parallelSum() {
    return Stream.iterate(1L, i -> i + 1).limit(N).parallel().reduce(0L, Long::sum);
  }

  @Benchmark
  public long rangedReduceSum() {
    return LongStream.rangeClosed(1, N).reduce(0, Long::sum);
  }

  @Benchmark
  public long rangedSum() {
    return LongStream.rangeClosed(1, N).sum();
  }

  @Benchmark
  public long parallelRangedReduceSum() {
    return LongStream.rangeClosed(1, N).parallel().reduce(0L, Long::sum);
  }

  @Benchmark
  public long parallelRangedSum() {
    return LongStream.rangeClosed(1, N).parallel().sum();
  }

  @TearDown(Level.Invocation)
  public void tearDown() {
    System.gc();
  }
Benchmark                                        Mode  Cnt   Score   Error  Units
ParallelStreamBenchmark.parallelRangedReduceSum  avgt   10   7.895 ± 0.450  ms/op
ParallelStreamBenchmark.parallelRangedSum        avgt   10   1.124 ± 0.165  ms/op
ParallelStreamBenchmark.rangedReduceSum          avgt   10   6.832 ± 0.165  ms/op
ParallelStreamBenchmark.rangedSum                avgt   10  21.564 ± 0.831  ms/op

The difference between rangedReduceSum and rangedSum is that only the internal function sum () is used. Why is there so much performance difference?

After verifying that the sum() function eventually uses reduce(0, Long::sum), isn't it the same as using reduce(0, Long::sum) in the rangedReduceSum method?

like image 484
Nick Avatar asked May 11 '20 02:05

Nick


1 Answers

I did the same tasks as OP, and I can reproduce exactly same result: the second task is ~3 times slower. But when I change the warmup to only 1 iteration, things start to get interesting:

# Benchmark: test.ParallelStreamBenchmark.rangedReduceSum
# Warmup Iteration   1: 3.619 ms/op
Iteration   1: 3.931 ms/op
Iteration   2: 3.927 ms/op
Iteration   3: 3.834 ms/op
Iteration   4: 4.006 ms/op
Iteration   5: 4.605 ms/op
Iteration   6: 6.454 ms/op
Iteration   7: 6.466 ms/op
Iteration   8: 6.328 ms/op
Iteration   9: 6.370 ms/op
Iteration  10: 6.244 ms/op

# Benchmark: test.ParallelStreamBenchmark.rangedSum
# Warmup Iteration   1: 3.971 ms/op
Iteration   1: 4.034 ms/op
Iteration   2: 3.970 ms/op
Iteration   3: 3.957 ms/op
Iteration   4: 4.024 ms/op
Iteration   5: 4.278 ms/op
Iteration   6: 19.302 ms/op
Iteration   7: 19.132 ms/op
Iteration   8: 19.189 ms/op
Iteration   9: 18.842 ms/op
Iteration  10: 18.292 ms/op

Benchmark                                Mode  Cnt   Score    Error  Units
ParallelStreamBenchmark.rangedReduceSum  avgt   10   5.216 ±  1.871  ms/op
ParallelStreamBenchmark.rangedSum        avgt   10  11.502 ± 11.879  ms/op

Each task all slow down significantly after 5th iteration. For the second task, it slows down 3 times exactly after 5th iteration. If we count warmup as iterations, after 10 iterations, it makes sense to start off slow already. Looks like a bug in Benchmark library, which doesn't play well with GC. But just like the warning says, benchmark result in such cases is just for reference.

like image 124
SwiftMango Avatar answered Oct 16 '22 13:10

SwiftMango