Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the differences between LLVM and java bytecode?

People also ask

What is LLVM bytecode?

What is commonly known as the LLVM bitcode file format (also, sometimes anachronistically known as bytecode) is actually two things: a bitstream container format and an encoding of LLVM IR into the container format. The bitstream format is an abstract encoding of structured data, very similar to XML in some ways.

What does LLVM stand for?

LLVM is an acronym that stands for low level virtual machine.

Does Java use LLVM?

JLang supports ahead-of-time compilation of Java. It works by adding an LLVM back end to the Polyglot compiler, allowing Java to be translated down to LLVM IR. From there, a back end can translate to the architecture of choice.

Is LLVM cross platform?

LLVM IR can be cross-platform, with the obvious exceptions others have listed. However, that does not mean Clang generates cross-platform code. As you note, the preprocessor is almost universally used to only pass parts of the code to the C/C++ compiler, depending on the platform.


Assuming you mean JVM rather than Java:

The LLVM is a low level register-based virtual machine. It is designed to abstract the underlying hardware and draw a clean line between a compiler back-end (machine code generation) and front-end (parsing, etc.).

The JVM is a much higher level stack-based virtual machine. The JVM provides garbage collection, has the notion of objects and virtual method calls and more. Thus, the JVM provides much higher level infrastructure for language interoperability (much like Microsoft's CLR).

(It is possible to build these abstractions over LLVM just as it is possible to build them on top of C.)


It's too bad this question got off on the wrong foot. I came to it looking for a more detailed comparison.

The biggest difference between JVM bytecode and and LLVM bitcode is that JVM instructions are stack-oriented, whereas LLVM bitcode is not. This means that rather than loading values into registers, JVM bytecode loads values onto a stack and computes values from there. I believe that an advantage of this is that the compiler doesn't have to allocate registers, but I'm not sure.

LLVM bitcode is closer to machine-level code, but isn't bound by a particular architecture. For instance, I think that LLVM bitcode can make use of an arbitrary number of logical registers. Maybe someone more familiar with LLVM can speak up here?


JVM bytecodes and LLVM bytecodes have similarities and differences. In terms of similarities, these are two intermediate program representations. Thus, they can represent programs written in different programming languages. As an example, there are frontends that translate Java, Closure, Scala, etc into JVM bytecodes, and there are frontends that translate C, C++, Swift, Julia, Rust, etc into LLVM bytecodes.

This said, JVM bytecodes and LLVM bytecodes are very different in purpose and in design. Historically, JVM bytecodes have been designed to be distributed over a network, e.g., the Internet, and interpreted in the local computer, via a virtual machine. That's one of the reasons why it's stack based: usually, stack-based bytecodes are smaller.

Perhaps, in its beginnings, the LLVM bytecodes have also been thought to be interpreted, but if it happened, its purpose has changed over time. So, LLVM bytecodes are a program representation meant to be analyzed and optimized. It is encoded in the Static Single Assignment format, which is more like a mathematical abstraction of a program than an actual, executable, assembly. So, there are instructions like phi-functions in the LLVM IR that do not have a direct equivalent in typical computer architectures, for instance. Thus, although it is possible to interpret LLVM bytecodes (there is a tool called lli that's part of the LLVM toolchain, that does that), that's not the most important way in which the LLVM IR is used.