I've seen multiple claims recently that talk about how the Java (and JVM-based languages such as Scala) are comparable in performance to C/C++ code.
For example, from the description of the ScalaLab project:
The speed of Scala based scripting, that approaches the speed of native and optimized Java code, and thus is close to, or even better from C/C++ based scientific code!
Can someone point me to a summary of what these JVM optimizations are? Are there any real benchmarks supporting this claim or providing some real-world comparison?
The fastest of them all is GraalVM EE 17, but the difference compared to OpenJDK 8/OpenJDK 17 is marginal. Here is a graph with the typical 256-byte message latency for the various JDK variants used (lower is better): Graph 1, Shows the median (typical) latency in ns for the various JDK variants.
Since Java is multithreaded and can run different threads in a single OS process, a single JVM process can run a very complex Java system such as a multi-tenant application server and all its apps.
2.1 Possible Causes for Slow JVM Startup An application might seem slow when it starts because, The application might be waiting to import files. A large number of methods might have to be compiled. There might be a problem in code optimization (especially on single-CPU machines).
First, it depends on which JVM you are talking about, since there are several - but I'm going to assume you mean Oracle HotSpot (and in any case, the other top-tier JVMs will use similar techniques).
For that JVM, this list from the HotSpot internal wiki provides a great start (and the child pages go into detail on some of the more interesting techiques). If you are just looking for a laundry list of tricks, the wiki has that too, although to make sense of them you'll probably have to google the individual terms.
Not all of these have been implemented recently, but some of the big ones have (range check elision, escape analysis, superword optimizations) - at least for a loose definition of "recently".
Next let's take a look at the relative performance picture when it comes to C/C++ vs Java, and why the techniques above help either help narrow the gap or in some cases actually give Java and intrinsic advantage over native-compiled languages.
At a high level, the optimizations are a mix of things that you'd see in any decent compiler for native languages like C and C++, along with things that are needed to reduce the impacts of Java/JVM specific features and safety checks, such as:
Many of these JVM-specific* optimizations only help bring the JVM up to parity with native languages, in that they are addressing hurdles the native languages don't have to deal with. A few optimizations, however, are things that a statically compiled language can't manage (or can manage in some cases only with profile-guided optmization, which is rare and is necessarily one-size-fits-all anyway):
The consensus seems to be that Java often produces code similar in speed to good C++ compilers at a moderate optimization level, such as gcc -O2, although a lot depends on the exact benchmark. Modern JVMs like HotSpot tends to excel at low level array traversal and math (as long as the competing compiler isn't vectorizing - that's hard to beat), or in scenarios with heavy object allocation when the competing code is doing a similar number of allocations (JVM object allocation + GC is generally faster than malloc), but falls down when the memory penalty of typical Java applications is a factor, where stack allocation is heavily used, or where vectorizing compilers or intrinsics tip the scales towards the native code.
If you search for Java vs C performance, you'll find plenty of people who have tackled this question, with varying levels of rigor. Here's the first one I stumbled across, which seems to show a rough tie between gcc and HotSpot (even at -O3 in this case). This post and the linked discussions is probably a better start if you want to see how a single benchmark can go through several iterations in each language, leapfrogging each other - and shows some of the limits of optimization on both sides.
*well not really JVM-specific - most would also apply to other safe or managed languages like the CLR
1 This particular optimization is becoming more and more relevant as new instruction sets (particularly SIMD instructions, but there are others) are being released with some frequency. Automatic vectorization can speed up some codes massively, and while Java has been slow off the mark here, they are at least catching up a bit.
Actual performance of course depends on benchmarks and differs by application. But it is easy to see how JIT VMs can be just as fast as statically compiled code, at least in theory.
The main strength of JIT code is that it can optimize based on information known only at runtime. In C when you link against a DLL, you'll have to make that function call every time. In a dynamic language, the function can be inlined, even if it's a function that was loaded at runtime, thanks to just in time compilation.
Another example is optimizing based on runtime values. In C/C++ you use a preprocessor macro to disable asserts and have to recompile if you want to change this option. In Java, asserts are handled by setting a private boolean field and then putting an if branch in the code. But since the VM can compile a version of the code that either included or doesn't include the assert code depending on the value of the flag, there is little or no performance hit.
Another major VM innovation is polymorphic inlining. Idomatic Java is focused very heavily on small wrapper methods like getters and setters. In order to acheive good performance, inlining them is obviously necessary. Not only can the VM inline polymorphic functions in the common case where only one type is actually being called, it can inline code that calls multiple different types, by including an inline cache with the appropriate code. If the code ever starts operating on lots of different types, the VM can detect this and fallback to slower virtual dispatch.
A static compiler of course can do none of this. Powerful static analysis only gets you so far. This isn't just limited to Java either, though it's the most obvious example. Google's V8 vm for Javascript is also pretty fast. Pypy aims to do the same for Python and Rubinius for Ruby, but they're not quite there (it helps when you have a big corporation backing you).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With