Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does verification of byte code happen twice? [duplicate]

So I am a little confused regarding the verification of bytecode that happens inside a JVM. According to the book by Deitel and Deitel, a Java program goes through five phases (edit, compile, load, verify and execute) (chapter 1). The bytecode verifier verifies the bytecode during the 'verify' stage. Nowhere does the book mention that the bytecode verifier is a part of the classloader.

However according to docs of oracle , the classloader performs the task of loading, linking and initialization, and during the process of linking it has to verify the bytecode.

Now, are the bytecode verification that Deitel and Deitel talks about, and the bytecode verification that this oracle document talks about, the same process?

Or does bytecode verification happen twice, once during the linking process and the other by the bytecode verifier?

Picture describing phases of a java program as mentioned in book by Dietel and Dietel.(I borrowed this pic from one of the answers below by nobalG :) ) enter image description here

like image 278
Smrita Avatar asked Aug 28 '14 06:08

Smrita


People also ask

What verification is done by the bytecode verifier?

The bytecode verifier traverses the bytecodes, constructs the type state information, and verifies the types of the parameters to all the bytecode instructions.

What is byte code verification?

When a class loader presents the bytecodes of a newly loaded Java platform class to the virtual machine, these bytecodes are first inspected by a verifier. The verifier checks that the instructions cannot perform actions that are obviously damaging.

Is all bytecode the same?

Just as there are many different machine instruction sets, there are many different bytecode instruction sets. Some, like Java bytecode, are a documented part of a platform. All Java virtual machines execute exactly the same bytecode, by definition.

Why are Java bytecodes need to be verified at run time stage?

Thus, the JVM performs a static analysis at loading time called class verification [1]. This verification includes bytecode verification to make sure that the byte code of the applet is proved to be semantically correct and cannot execute ill-typed operations at run time.


1 Answers

You may understand the byte code verification using this diagram which is in detail explained in Oracle docs

enter image description here

You will find that the byte code verification happens only once not twice

The illustration shows the flow of data and control from Java language source code through the Java compiler, to the class loader and bytecode verifier and hence on to the Java virtual machine, which contains the interpreter and runtime system. The important issue is that the Java class loader and the bytecode verifier make no assumptions about the primary source of the bytecode stream--the code may have come from the local system, or it may have travelled halfway around the planet. The bytecode verifier acts as a sort of gatekeeper: it ensures that code passed to the Java interpreter is in a fit state to be executed and can run without fear of breaking the Java interpreter. Imported code is not allowed to execute by any means until after it has passed the verifier's tests. Once the verifier is done, a number of important properties are known:

  • There are no operand stack overflows or underflows
  • The types of the parameters of all bytecode instructions are known to always be correct
  • Object field accesses are known to be legal--private, public, or protected

While all this checking appears excruciatingly detailed, by the time the bytecode verifier has done its work, the Java interpreter can proceed, knowing that the code will run securely. Knowing these properties makes the Java interpreter much faster, because it doesn't have to check anything. There are no operand type checks and no stack overflow checks. The interpreter can thus function at full speed without compromising reliability.

EDIT:-

From Oracle Docs Section 5.3.2:

When the loadClass method of the class loader L is invoked with the name N of a class or interface C to be loaded, L must perform one of the following two operations in order to load C:

  • The class loader L can create an array of bytes representing C as the bytes of a ClassFile structure (§4.1); it then must invoke the method defineClass of class ClassLoader. Invoking defineClass causes the Java Virtual Machine to derive a class or interface denoted by N using L from the array of bytes using the algorithm found in §5.3.5.
  • The class loader L can delegate the loading of C to some other class loader L'. This is accomplished by passing the argument N directly or indirectly to an invocation of a method on L' (typically the loadClass method). The result of the invocation is C.

As correctly commented by Holger, trying to explain it more with the help of an example:

static int factorial(int n) { int res; for (res = 1; n > 0; n--) res = res * n; return res; } 

The corresponding byte code would be

method static int factorial(int), 2 registers, 2 stack slots 0: iconst_1 // push the integer constant 1 1: istore_1 // store it in register 1 (the res variable) 2: iload_0 // push register 0 (the n parameter) 3: ifle 14 // if negative or null, go to PC 14 6: iload_1 // push register 1 (res) 7: iload_0 // push register 0 (n) 8: imul // multiply the two integers at top of stack 9: istore_1 // pop result and store it in register 1 10: iinc 0, -1 // decrement register 0 (n) by 1 11: goto 2 // go to PC 2 14: iload_1 // load register 1 (res) 15: ireturn // return its value to caller 

Note that most of the instructions in JVM are typed.

Now you should note that proper operation of the JVM is not guaranteed unless the code meets at least the following conditions:

  • Type correctness: the arguments of an instruction are always of the types expected by the instruction.
  • No stack overflow or underflow: an instruction never pops an argument off an empty stack, nor pushes a result on a full stack (whose size is equal to the maximal stack size declared for the method).
  • Code containment: the program counter must always point within the code for the method, to the beginning of a valid instruction encoding (no falling off the end of the method code; no branches into the middle of an instruction encoding).
  • Register initialization: a load from a register must always follow at least one store in this register; in other terms, registers that do not correspond to method parameters are not initialized on method entrance, and it is an error to load from an uninitialized register.
  • Object initialization: when an instance of a class C is created, one of the initialization methods for class C (corresponding to the constructors for this class) must be invoked before the class instance can be used.

The purpose of byte code verification is to check these condition once and for all, by static analysis of the byte code at load time. Byte code that passes verfification can then be executed faster.

Also to note that byte code verification purpose is to shift the verfification listed above from run time to load time.

The above explanation has been taken from Java bytecode verification: algorithms and formalizations

like image 58
Rahul Tripathi Avatar answered Sep 22 '22 04:09

Rahul Tripathi