Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java - Is binary code the same as ByteCode?

The answer depends on what you mean by binary code.

Java bytecode is a binary data format that includes loading information and execution instructions for the Java virtual machine. In that sense, Java bytecode is a special kind of binary code.

When you use the term "binary code" to mean machine instructions for a real processors architecture (like IA-32 or Sparc) then it is different.
Java bytecode is not a binary code in that sense. It is not processor-specific.


JVM is very complex program, and the flow there is in certain level unpredictable. E.g. flow inside HotSpot JVM is something like the following:

1) it takes your bytecode and interprets it
2) if some method is executed quite frequently (some amount of times during some time span) it is marked as a "hot" method and JVM schedules its compiling to platform depended machine code (is that what you have called binary code?). That flow looks like the following:

ByteCode
--> Hige-level Intermediate Representation (HIR)
  --> Middle-level Intermediate Representation (MIR)
    --> Low-level Intermediate Representation (LIR)
      --> Register Allocation
        --> EMIT (platform dependent machine code)

Each step in that flow is important and helps JVM perform some optimizations of your code. It does not change your algorithm of course, optimization just means that some sequences of code can be detected and exchanged with better performing code (producing the same result). Starting from LIR stage, code becomes platform dependent (!).

Bytecode can be good for interpretation, but not good enough to be easily transformed into the machine native code. HIR takes care of it and its purpose is to quickly transform bytecode into an intermediate representation. MIR transforms all operations into the three-operands operation; ByteCode is based on stack operation:

iload_0
iload_1
iand

that was bytecode for simple and operation, and middle level representation for this will be sort of the following:

and v0 v1 -> v2

LIR depends on platform, taking into account our simple example with and operation, and specifying our platform as x86, then our code snippet will be:

x86_and v1 v0 -> v1
x86_move v1 -> v2

because and operation takes two operands, first one is destination, another one is source, and then we put the result value to another "variable". Next stage is "register allocation", because x86 platform (and probably most others) work with registers, and not variables (like intermediate representation), nor stack (like bytecode). Here our code snippet should be like the following:

x86_and eax ecx -> eax

and here you can notice absence of a "move" operation. Our code contained only one line and JVM figured out that creating a new virtual variable was not neede; we can just reuse the eax register. If code is big enough, having many variables and working with them intensive (e.g. using eax somewhere below, so we can't change its value), then you will see move operation left in machine code. That's again about optimization :)

That was JIT flow, but depending on VM implementation there can be one more step - if code was compiled (being "hot"), and still executed many many times, JVM schedules optimization of that code (e.g. using inlining).

Well, conclusion is that the path from bytecode to machine code is quite interesting, a bit unforeseeable, and depends on many many things.

btw, the described above process is called "Mixed mode interpretation" (when JVM first interprets bytecode, and then uses JIT compilation), example of such JVM is HotSpot. Some JVMs (like JRockit from Oracle) use JIT compilation only.

This was a very simple description of what is going on there. I hope that it helps to understand the flow inside JVM on a very high level, as well as targets the question about differences between bytecode and binary code. For references, and other issues not mentioned here and related to that topic, please read the similar topic "Why are compiled Java class files smaller than C compiled files?".

Also feel free to critique this answer, point me to mistakes or misunderstanding of mine, I'm always willing to improve my knowledge about JVM :)


There's no such thing as "machine-independent-bytecode" (it wouldn't make any sense if you think about it). Bytecode is only (for the purposes of this answer) used for things like virtual machines. VMs (such as the JVM) INTERPRET the bytecode and use some clever and complicated just-in-time compilation (which IS machine/platform-dependent) to give you the final product.

So in a sense, both of the answers are right and wrong. The Java compiler compiles code into Java bytecode (machine-independent). The *.class files the bytecode is located in are binary - they are executable, after all. The Virtual machine later interprets these binary *.class files (note: when describing files as binary, it's somewhat of a misnomer) and does various awesome stuff. More often than not, the JVM uses something called JIT (just-in-time compilation), which generates either platform-specific, or machine-specific instructions that speed up various parts of execution. JIT is another topic for another day, however.

Edit:

Java File (.java) -> [javac.exe] -> ByteCode File (.class) -> [JVM/Java Interpreter] -> Running it(by first converting it into binary code specific to the machine)

This is incorrect. The JVM doesn't "convert" anything. It simply interprets the bytecode. The only part of the JVM that "converts" bytecode is when the JIT compiler is invoked, which is a special case and should not be generalized.


Both C/C++ (to take as an example) and Java programs are compiled into Binary Code. This generic term just means that the new created file does not encode the instructions in a human-readable way. (i.e. You won't be able to open the compiled file in a text program and read it).

On the other hand, what the Binary 0's and 1's encode (or represent), depends on what the compiler generated. In the case of Java, it generates instructions called Bytecode, which are interpreted by the JVM. In other cases, for other languages, it may generate IA-32 or SPARC instructions.

In conclusion, the way the terms Binary code and Java bytecode are opposed to each other is misleading. The reason was to make the distinction between the normal binary code which is machine dependant, and the Java bytecode (also a binary code) which is not.


Answer i found today for above question:

Source: JLS

Loading refers to the process of finding the binary form of a class or interface type with a particular name, perhaps by computing it on the fly, but more typically by retrieving a binary representation previously computed from source code by a Java compiler, and constructing, from that binary form, a Class object to represent the class or interface.

The precise semantics of loading are given in Chapter 5 of The Java Virtual Machine Specification, Java SE 7 Edition. Here we present an overview of the process from the viewpoint of the Java programming language.

The binary format of a class or interface is normally the class file format described in The Java Virtual Machine Specification, Java SE 7 Edition cited above, but other formats are possible, provided they meet the requirements specified in §13.1. The method defineClass of class ClassLoader may be used to construct Class objects from binary representations in the class file format.