I've created a benchmark for a method which finds out the longest common subsequence using dynamic programming:
@Benchmark
def longestCommonSubsequenceDP(): String = {
val s1 = "Pellentesque lacinia"
val s2 = "Mauris purus massa"
val up = 1
val left = 2
val charMatched = 3
val s1Length = s1.length()
val s2Length = s2.length()
val lcsLengths = Array.fill[Int](s1Length + 1, s2Length + 1)(0)
for (i <- 0 until s1Length) {
for (j <- 0 until s2Length) {
if (s1.charAt(i) == s2.charAt(j)) {
lcsLengths(i + 1)(j + 1) = lcsLengths(i)(j) + 1
} else {
if (lcsLengths(i)(j + 1) >= lcsLengths(i + 1)(j)) {
lcsLengths(i + 1)(j + 1) = lcsLengths(i)(j + 1)
} else {
lcsLengths(i + 1)(j + 1) = lcsLengths(i + 1)(j)
}
}
}
}
val subSeq = new StringBuilder()
var s1Pos = s1Length
var s2Pos = s2Length
do {
if (lcsLengths(s1Pos)(s2Pos) == lcsLengths(s1Pos -1)(s2Pos)) {
s1Pos -= 1
} else if (lcsLengths(s1Pos)(s2Pos) == lcsLengths(s1Pos)(s2Pos - 1)) {
s2Pos -= 1
} else {
assert(s1.charAt(s1Pos - 1) == s2.charAt(s2Pos - 1))
subSeq += s1.charAt(s1Pos - 1)
s1Pos -= 1
s2Pos -= 1
}
} while (s1Pos > 0 && s2Pos > 0)
subSeq.toString.reverse
}
and ran it with the following configuration jmh:run -i 10 -wi 10 -f1 -t1
and got the following results:
GraalVM EE 1.0.0-rc10
[info] Benchmark Mode Cnt Score Error Units
[info] LCS.longestCommonSubsequenceDP thrpt 25 91.411 ± 4.355 ops/ms
GraalVM CE 1.0.0-rc10
[info] Benchmark Mode Cnt Score Error Units
[info] LCS.longestCommonSubsequenceDP thrpt 25 26.741 ± 0.408 ops/ms
OpenJDK 1.8.0_192
[info] Benchmark Mode Cnt Score Error Units
[info] LCS.longestCommonSubsequenceDP thrpt 25 45.216 ± 1.956 ops/ms
Also I did another test where I created a list with thousands of objects, performed some filtering and sort on it, and thrpt
was smallest on GraalVM CE.
Why this difference?
It turned out that GraalVM is a bit faster overall than HotSpot but there are some outliers to the left, i.e. some problems where we can observe a severe performance degradation. The native image, however, is 50% slower on average (in terms of moves per second) than HotSpot.
Run Java Faster The compiler of GraalVM provides performance advantages for highly abstracted programs due to its ability to remove costly object allocations in many scenarios.
Platform Updates. Java 16 (experimental) support: The GraalVM distributions based on Oracle Java 16 and OpenJDK 16 are available for download with several known limitations. MacOS platform support: Builds of GraalVM Community Edition for macOS based on OpenJDK 8 are no longer being produced.
GraalVM Community Edition is open source software built from the sources available on GitHub and distributed under version 2 of the GNU General Public License with the “Classpath” Exception.
You get different results because the runtimes you're using have different top tier JIT compilers enabled. Unless specified otherwise (with the command flags for example):
JIT compiles your code at runtime to the machine code which heavily depends on the original code, workload, JIT configuration, enabled optimizations and so on.
It is reasonable to expect that different implementations of the JIT compiler would show different results on the same benchmark.
If you're asking why GraalVM CE doesn't show better results on this particular benchmark rather the philosophical question about the difference in general; here's a short explanation. All compilers are good at something, Graal for example has excellent escape analysis and inlining algorithms, which shows great results on the code that uses abstractions: allocates objects, calls methods, etc.
This particular benchmark fills an array with ints and runs a loop. Which probably doesn't exactly allow Graal to do things it's good at. So, this is an example of a microbenchmark C2 is better at. You can probably construct a similar benchmark which GraalVM CE would show superiority over OpenJDK (perhaps you can try this one: http://www.graalvm.org/docs/examples/java-simple-stream-benchmark/).
The GraalVM team runs a large corpus of benchmarks and that's the source of the knowledge that GraalVM CE is better. However, one needs to understand that reducing a complex set of benchmarks results to a single number is not the most meaningful thing for assessing performance of any particular piece of code and its workload. One should always strive to evaluate on their code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With