Why GraalVM CE has smaller throughput than GraalVM EE or OpenJDK 8

Tags:

I've created a benchmark for a method which finds out the longest common subsequence using dynamic programming:

@Benchmark
  def longestCommonSubsequenceDP(): String = {
    val s1 = "Pellentesque lacinia"
    val s2 = "Mauris purus massa"
    val up = 1
    val left = 2
    val charMatched = 3

    val s1Length = s1.length()
    val s2Length = s2.length()

    val lcsLengths = Array.fill[Int](s1Length + 1, s2Length + 1)(0)

    for (i <- 0 until s1Length) {
      for (j <- 0 until s2Length) {
        if (s1.charAt(i) == s2.charAt(j)) {
          lcsLengths(i + 1)(j + 1) = lcsLengths(i)(j) + 1
        } else {
          if (lcsLengths(i)(j + 1) >= lcsLengths(i + 1)(j)) {
            lcsLengths(i + 1)(j + 1) = lcsLengths(i)(j + 1)
          } else {
            lcsLengths(i + 1)(j + 1) = lcsLengths(i + 1)(j)
          }
        }
      }
    }

    val subSeq = new StringBuilder()
    var s1Pos = s1Length
    var s2Pos = s2Length

    do {
      if (lcsLengths(s1Pos)(s2Pos) == lcsLengths(s1Pos -1)(s2Pos)) {
        s1Pos -= 1
      } else if (lcsLengths(s1Pos)(s2Pos) == lcsLengths(s1Pos)(s2Pos - 1)) {
        s2Pos -= 1
      } else {
        assert(s1.charAt(s1Pos - 1) == s2.charAt(s2Pos - 1))
        subSeq += s1.charAt(s1Pos - 1)
        s1Pos -= 1
        s2Pos -= 1
      }

    } while (s1Pos > 0 && s2Pos > 0)

    subSeq.toString.reverse
  }

and ran it with the following configuration jmh:run -i 10 -wi 10 -f1 -t1 and got the following results:

GraalVM EE 1.0.0-rc10

[info] Benchmark                        Mode  Cnt   Score   Error   Units
[info] LCS.longestCommonSubsequenceDP  thrpt   25  91.411 ± 4.355  ops/ms

GraalVM CE 1.0.0-rc10

[info] Benchmark                        Mode  Cnt   Score   Error   Units
[info] LCS.longestCommonSubsequenceDP  thrpt   25  26.741 ± 0.408  ops/ms

OpenJDK 1.8.0_192

[info] Benchmark                        Mode  Cnt   Score   Error   Units
[info] LCS.longestCommonSubsequenceDP  thrpt   25  45.216 ± 1.956  ops/ms

Also I did another test where I created a list with thousands of objects, performed some filtering and sort on it, and thrpt was smallest on GraalVM CE.

Why this difference?

980

asked Dec 16 '18 01:12

A5300

1 Answers

You get different results because the runtimes you're using have different top tier JIT compilers enabled. Unless specified otherwise (with the command flags for example):

OpenJDK 1.8.0_192 uses C2
GraalVM CE 1.0.0-rc10 uses the Graal compiler.
GraalVM EE 1.0.0-rc10 uses the enterprise version of the Graal compiler.

JIT compiles your code at runtime to the machine code which heavily depends on the original code, workload, JIT configuration, enabled optimizations and so on.

It is reasonable to expect that different implementations of the JIT compiler would show different results on the same benchmark.

If you're asking why GraalVM CE doesn't show better results on this particular benchmark rather the philosophical question about the difference in general; here's a short explanation. All compilers are good at something, Graal for example has excellent escape analysis and inlining algorithms, which shows great results on the code that uses abstractions: allocates objects, calls methods, etc.

This particular benchmark fills an array with ints and runs a loop. Which probably doesn't exactly allow Graal to do things it's good at. So, this is an example of a microbenchmark C2 is better at. You can probably construct a similar benchmark which GraalVM CE would show superiority over OpenJDK (perhaps you can try this one: http://www.graalvm.org/docs/examples/java-simple-stream-benchmark/).

The GraalVM team runs a large corpus of benchmarks and that's the source of the knowledge that GraalVM CE is better. However, one needs to understand that reducing a complex set of benchmarks results to a single number is not the most meaningful thing for assessing performance of any particular piece of code and its workload. One should always strive to evaluate on their code.

answered Oct 27 '22 09:10

Oleg Šelajev

Related questions
                            
                                How to Reference Spark Broadcast Variables Outside of Scope
                            
                                How to parallelize Spark scala computation?
                            
                                Scala pattern matching performance
                            
                                Creating a type class instance for a sealed trait
                            
                                type parameters and numeric widening
                            
                                Is there a way to ensure a type is Serializable at compile time
                            
                                Gatling scenario with 10 requests per hour (less that 1 rps)
                            
                                How to automate StructType creation for passing RDD to DataFrame
                            
                                Subtyping between function types
                            
                                Scalacheck Shrink
                            
                                error: not found: value assemblyJarName in assembly
                            
                                Is connection pooling in akka-http using the source queue Implementation thread safe?
                            
                                new-style ("inline") macros require scala.meta
                            
                                How does Scala use all my cores here?
                            
                                scalatest Flatspec: Timeout for entire class
                            
                                Scala lists with existential types: `map{ case t => ... }` works, `map{ t => ... }` doesn't?
                            
                                Effect abstraction in Cats and parallel execution
                            
                                akka http to use Json Support and xmlsupport
                            
                                Watermarking for Spark structured streaming with three way joins
                            
                                Graphx : Is it possible to execute a program on each vertex without receiving a message?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why GraalVM CE has smaller throughput than GraalVM EE or OpenJDK 8

Tags:

java-8

scala

graalvm

A5300

People also ask

1 Answers

Oleg Šelajev

Recent Activity

Donate For Us