Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clarifications on Bytecode and objects

I am writing a Bytecode instrumenter. Right now, I am trying to find out how to do that in the presence of objects. I would like some clarifications on two lines I read in the JVMS (section 4.9.4):

1) "The verifier rejects code that uses the new object before it has been initialized."

My question is, what does "uses" mean here? I'm guessing that it means: passing it as a method attribute, calling GETFIELD and PUTFIELD on it, or calling any instance method on it. Are their other forbidden uses? And I believe that it follows that other instructions such as DUP, LOAD and STORE are allowed.

2) "Before that method invokes another instance initialization method of myClass or its direct superclass on this, the only operation the method can perform on this is assigning fields declared within myClass."

Which means that in an <init> method, GETFIELD and PUTFIELD are allowed before another <init> is called. However, in Java, doing any operation on an instance field before a call to super() or this() results in a compilation error. Could someone clarify this?

3) I have one more question. When does an object reference becomes initialized, and hence, ready to be freely used? From reading the JVMS, I came up with the answer that whether an object is initialized or not, is up to each method. At a certain point in time, the object can be initialized for a method but not for the other. Specifically, an object becomes initialized for a method when <init> called by that method returns.

For example, consider that the main() method created an object and called <init> which then called the superclass's <init>. After returning from super(), the object is now considered initialized by <init> , but is not yet initialized for main(). Does this mean, in <init> after super(), I can pass the object as a parameter to a method, even before returning from to main().

Could someone confirm that this whole analysis is true? Thank you for your time.

ps: I have actually posted the same question to the Sun forums but with on response. I hope I'll have more luck here. Thank you.

Update

First thank you for your answers and time. Although I didn't get a clear-cut answer (I had many questions and some of them were a bit vague), your answers and examples, and the subsequent experiments, were extremely useful for me in understanding more deeply how the JVM works.

The main thing I discovered is that the Verifier's behavior differ with different implementations and versions (which makes the job of bytecode manipulation much more complicated). The problem lies in either a non-conformity to the JVMS, or a lack of documentation from the verifier's developers, or the JVMS has some subtle vagueness in the verifier's area.

One last thing, SO Rocks!!! I posted the same question in the official Sun JVM Specifications forum, and I still got no answer till now.

like image 722
H-H Avatar asked Jul 19 '10 06:07

H-H


People also ask

What do you understand by bytecode?

Bytecode is computer object code that an interpreter converts into binary machine code so it can be read by a computer's hardware processor. The interpreter is typically implemented as a virtual machine (VM) that translates the bytecode for the target platform.

Is Java object code bytecode?

Java compiler produces bytecode (compilation half the way, platform independent, cannot run yet). Java virtual machine produces machine code. Object code is a portion of the resulting machine code. Bytecode becomes object code at JIT time.

What is the purpose of bytecode verification?

The bytecode verifier acts as a sort of gatekeeper: it ensures that code passed to the Java interpreter is in a fit state to be executed and can run without fear of breaking the Java interpreter. Imported code is not allowed to execute by any means until after it has passed the verifier's tests.

Is bytecode universal?

The bytecode generated is a non-executable code and needs an interpreter to execute on a machine. This interpreter is the JVM and thus the Bytecode is executed by the JVM. And finally program runs to give the desired output." So, I understand that bytecode is universal which is what makes java platform independent.


2 Answers

Contrary to what the java language specifies, at the bytecode level it is possible to access fields of a class in a constructor before calling the superclass constructor. The following code uses the asm library to create such a class:

package asmconstructortest;

import java.io.FileOutputStream;
import org.objectweb.asm.*;
import org.objectweb.asm.util.CheckClassAdapter;
import static org.objectweb.asm.Opcodes.*;

public class Main {

    public static void main(String[] args) throws Exception {
        //ASMifierClassVisitor.main(new String[]{"/Temp/Source/asmconstructortest/build/classes/asmconstructortest/Test.class"});
        ClassWriter cw = new ClassWriter(0);
        CheckClassAdapter ca = new CheckClassAdapter(cw);

        ca.visit(V1_5, ACC_PUBLIC + ACC_SUPER, "asmconstructortest/Test2", null, "java/lang/Object", null);

        {
            FieldVisitor fv = ca.visitField(ACC_PUBLIC, "property", "I", null, null);
            fv.visitEnd();
        }

        {
            MethodVisitor mv = ca.visitMethod(ACC_PUBLIC, "<init>", "()V", null, null);
            mv.visitCode();
            mv.visitVarInsn(ALOAD, 0);
            mv.visitInsn(ICONST_1);
            mv.visitFieldInsn(PUTFIELD, "asmconstructortest/Test2", "property", "I");
            mv.visitVarInsn(ALOAD, 0);
            mv.visitMethodInsn(INVOKESPECIAL, "java/lang/Object", "<init>", "()V");
            mv.visitInsn(RETURN);
            mv.visitMaxs(2, 1);
            mv.visitEnd();
        }

        ca.visitEnd();

        FileOutputStream out = new FileOutputStream("/Temp/Source/asmconstructortest/build/classes/asmconstructortest/Test2.class");
        out.write(cw.toByteArray());
        out.close();
    }
}

Instantiation this class works fine, without any verification errors:

package asmconstructortest;

public class Main2 {
    public static void main(String[] args) {
        Test2 test2 = new Test2();
        System.out.println(test2.property);
    }
}
like image 104
Jörn Horstmann Avatar answered Oct 21 '22 03:10

Jörn Horstmann


"The verifier rejects code that uses the new object before it has been initialized."

In bytecode verification, since the verifier works at link-time, the types of local variables of methods are inferred. The types of method arguments are known as they are in the method signature in the class file. The types of other local variables are not known and are inferred, so I assume the "uses" in the above statement relates to this.

EDIT: The section 4.9.4 of the JVMS reads:

The instance initialization method (§3.9) for class myClass sees the new uninitialized object as its this argument in local variable 0. Before that method invokes another instance initialization method of myClass or its direct superclass on this, the only operation the method can perform on this is assigning fields declared within myClass.

This assignment of fields in the above statement is the "initial" initialization of the instance variables to default initial values (like int is 0, float is 0.0f etc.) when the memory for the object is allocated. There is one more "proper" initialization of instance variables when the virtual machine invokes the instance initialization method(constructor) on the object.


The link provided by John Horstmann helped clarify things. So these statements dont hold true. "This DOESNOT mean that in an <init> method, getfield and putfield are allowed before another <init> is called." The getfield and putfield instructions are used to access (and change) the instance variables(fields) of a class (or instance of a class). And this can happen only when the instance variables(fields) are initialized."

From the JVMS :

Each instance initialization method (§3.9), except for the instance initialization method derived from the constructor of class Object, must call either another instance initialization method of this or an instance initialization method of its direct superclass super before its instance members are accessed. However, instance fields of this that are declared in the current class may be assigned before calling any instance initialization method.

When the Java Virtual Machine creates a new instance of a class, either implicitly or explicitly, it first allocates memory on the heap to hold the object's instance variables. Memory is allocated for all variables declared in the object's class and in all its superclasses, including instance variables that are hidden. As soon as the virtual machine has set aside the heap memory for a new object, it immediately initializes the instance variables to default initial values. Once the virtual machine has allocated memory for the new object and initialized the instance variables to default values, it is ready to give the instance variables their proper initial values. The Java Virtual Machine uses two techniques to do this, depending upon whether the object is being created because of a clone() invocation. If the object is being created because of a clone(), the virtual machine copies the values of the instance variables of the object being cloned into the new object. Otherwise, the virtual machine invokes an instance initialization method on the object. The instance initialization method initializes the object's instance variables to their proper initial values. And only after this can you use getfield and putfield.

The java compiler generates atleast one instance initialization method(constructor) for every class it compiles. If the class declares no constructors explicitly, the compiler generated a default no-arg constructor that just invokes the superclass no-arg constructor. And rightly so doing any operation on an instance field before a call to super() or this() results in a compilation error.

An <init> method can contain three kinds of code: an invocation of another <init> method, code that implements any instance variable initializers, and code for the body of the constructor. If a constructor begins with an explicit invocation of another constructor in the same class (a this() invocation) its corresponding <init> method will be composed of two parts:

  • an invocation of the same-class <init> method
  • the bytecodes that implement the body of the corresponding constructor

If a constructor does not begin with a this() invocation and the class is not Object, the <init> method will have three components:

  • an invocation of a superclass <init> method
  • the bytecodes for any instance variable initializers
  • the bytecodes that implement the body of the corresponding constructor


If a constructor does not begin with a this() invocation and the class is Object(and Object has no superclass), then its <init> method cant begin with a superclass <init> method invocation. If a constructor begins with an explicit invocation of a superclass constructor ( a super() invocation), its <init> method will invoke the corresponding superclass <init>method.



I think this answers your first and second question.

Updated:

For example,

  class Demo
  {
     int somint;

     Demo() //first constructor
     {
      this(5);
      //some other stuff..
     }

     Demo(int i) //second constructor
     {
      this.somint = i;
      //some other stuff......
     }
     Demo(int i, int j) //third constructor
     {
      super();
      //other stuffff......
     }
  }

Heres the bytecode for the above three constructors from the compiler(javac):

Demo();
  Code:
   Stack=2, Locals=1, Args_size=1
   0:   aload_0
   1:   iconst_5
   2:   invokespecial   #1; //Method "<init>":(I)V
   5:   return

Demo(int);
  Code:
   Stack=2, Locals=2, Args_size=2
   0:   aload_0
   1:   invokespecial   #2; //Method java/lang/Object."<init>":()V
   4:   aload_0
   5:   iload_1
   6:   putfield        #3; //Field somint:I
   9:   return

Demo(int, int);
  Code:
   Stack=1, Locals=3, Args_size=3
   0:   aload_0
   1:   invokespecial   #2; //Method java/lang/Object."<init>":()V
   4:   return

In the first constructor, the <init> method begins with calling the same-class <init> method and then executed the body of the corresponding constructor. Because the constructor begins with a this(), its corresponding <init> method doesnot contain bytecode for initializing the instance variables.

In the second constructor, the <init> method for the constructor has

  • super class <init> method, ie, invocation of the superclass constructor(no arg method), the compiler generated this by default because no explicit super() was found as the first statement.
  • the bytecode for initializing the instance variable someint.
  • bytecode for rest of the stuff in the constructor body.
like image 4
Zaki Avatar answered Oct 21 '22 01:10

Zaki