So I am a little confused regarding the verification of bytecode that happens inside a JVM. According to the book by Deitel and Deitel, a Java program goes through five phases (edit, compile, load, verify and execute) (chapter 1). The bytecode verifier verifies the bytecode during the 'verify' stage. Nowhere does the book mention that the bytecode verifier is a part of the classloader.
However according to docs of oracle , the classloader performs the task of loading, linking and initialization, and during the process of linking it has to verify the bytecode.
Now, are the bytecode verification that Deitel and Deitel talks about, and the bytecode verification that this oracle document talks about, the same process?
Or does bytecode verification happen twice, once during the linking process and the other by the bytecode verifier?
Picture describing phases of a java program as mentioned in book by Dietel and Dietel.(I borrowed this pic from one of the answers below by nobalG :) )
The bytecode verifier traverses the bytecodes, constructs the type state information, and verifies the types of the parameters to all the bytecode instructions.
When a class loader presents the bytecodes of a newly loaded Java platform class to the virtual machine, these bytecodes are first inspected by a verifier. The verifier checks that the instructions cannot perform actions that are obviously damaging.
Just as there are many different machine instruction sets, there are many different bytecode instruction sets. Some, like Java bytecode, are a documented part of a platform. All Java virtual machines execute exactly the same bytecode, by definition.
Thus, the JVM performs a static analysis at loading time called class verification [1]. This verification includes bytecode verification to make sure that the byte code of the applet is proved to be semantically correct and cannot execute ill-typed operations at run time.
You may understand the byte code verification using this diagram which is in detail explained in Oracle docs
You will find that the byte code verification happens only once not twice
The illustration shows the flow of data and control from Java language source code through the Java compiler, to the class loader and bytecode verifier and hence on to the Java virtual machine, which contains the interpreter and runtime system. The important issue is that the Java class loader and the bytecode verifier make no assumptions about the primary source of the bytecode stream--the code may have come from the local system, or it may have travelled halfway around the planet. The bytecode verifier acts as a sort of gatekeeper: it ensures that code passed to the Java interpreter is in a fit state to be executed and can run without fear of breaking the Java interpreter. Imported code is not allowed to execute by any means until after it has passed the verifier's tests. Once the verifier is done, a number of important properties are known:
- There are no operand stack overflows or underflows
- The types of the parameters of all bytecode instructions are known to always be correct
- Object field accesses are known to be legal--private, public, or protected
While all this checking appears excruciatingly detailed, by the time the bytecode verifier has done its work, the Java interpreter can proceed, knowing that the code will run securely. Knowing these properties makes the Java interpreter much faster, because it doesn't have to check anything. There are no operand type checks and no stack overflow checks. The interpreter can thus function at full speed without compromising reliability.
EDIT:-
From Oracle Docs Section 5.3.2:
When the loadClass method of the class loader L is invoked with the name N of a class or interface C to be loaded, L must perform one of the following two operations in order to load C:
- The class loader L can create an array of bytes representing C as the bytes of a ClassFile structure (§4.1); it then must invoke the method defineClass of class ClassLoader. Invoking defineClass causes the Java Virtual Machine to derive a class or interface denoted by N using L from the array of bytes using the algorithm found in §5.3.5.
- The class loader L can delegate the loading of C to some other class loader L'. This is accomplished by passing the argument N directly or indirectly to an invocation of a method on L' (typically the loadClass method). The result of the invocation is C.
As correctly commented by Holger, trying to explain it more with the help of an example:
static int factorial(int n) { int res; for (res = 1; n > 0; n--) res = res * n; return res; }
The corresponding byte code would be
method static int factorial(int), 2 registers, 2 stack slots 0: iconst_1 // push the integer constant 1 1: istore_1 // store it in register 1 (the res variable) 2: iload_0 // push register 0 (the n parameter) 3: ifle 14 // if negative or null, go to PC 14 6: iload_1 // push register 1 (res) 7: iload_0 // push register 0 (n) 8: imul // multiply the two integers at top of stack 9: istore_1 // pop result and store it in register 1 10: iinc 0, -1 // decrement register 0 (n) by 1 11: goto 2 // go to PC 2 14: iload_1 // load register 1 (res) 15: ireturn // return its value to caller
Note that most of the instructions in JVM are typed.
Now you should note that proper operation of the JVM is not guaranteed unless the code meets at least the following conditions:
The purpose of byte code verification is to check these condition once and for all, by static analysis of the byte code at load time. Byte code that passes verfification can then be executed faster.
Also to note that byte code verification purpose is to shift the verfification listed above from run time to load time.
The above explanation has been taken from Java bytecode verification: algorithms and formalizations
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With