Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Constructor bytecode

The ASM guide talks about constructors:

package pkg;
public class Bean {
  private int f;
  public int getF() {
      return this.f;
  }
  public void setF(int f) {
      this.f = f;
  }
}

The Bean class also has a default public constructor which is generated by the compiler, since no explicit constructor was defined by the programmer. This default public constructor is generated as Bean() { super(); }. The bytecode of this constructor is the following:

ALOAD 0
INVOKESPECIAL java/lang/Object <init> ()V
RETURN

The first instruction pushes this on the operand stack. The second instruction pops this value from the stack, and calls the <init> method defined in the Object class. This corresponds to the super() call, i.e. a call to the constructor of the super class, Object. You can see here that constructors are named differently in compiled and source classes: in compiled classes they are always named <init>, while in source classes they have the name of the class in which they are defined. Finally the last instruction returns to the caller.

How is the value of this already known to the JVM before the first instruction of the constructor?

like image 327
rapt Avatar asked Jan 02 '23 15:01

rapt


2 Answers

At the JVM level, first the object is allocated, uninitialized, then the constructor is invoked on that object. The constructor is more-or-less an instance method executed on the uninitialized object.

Even in the Java language, this exists and has all its fields at the first line of the constructor.

like image 141
Louis Wasserman Avatar answered Jan 04 '23 04:01

Louis Wasserman


The first thing to understand, is how object instantiation work on the bytecode level.

As explained in JVMS, §3.8. Working with Class Instances:

Java Virtual Machine class instances are created using the Java Virtual Machine's new instruction. Recall that at the level of the Java Virtual Machine, a constructor appears as a method with the compiler-supplied name <init>. This specially named method is known as the instance initialization method (§2.9). Multiple instance initialization methods, corresponding to multiple constructors, may exist for a given class. Once the class instance has been created and its instance variables, including those of the class and all of its superclasses, have been initialized to their default values, an instance initialization method of the new class instance is invoked. For example:

   Object create() {
       return new Object();
   }

compiles to:

   Method java.lang.Object create()
   0   new #1              // Class java.lang.Object
   3   dup
   4   invokespecial #4    // Method java.lang.Object.<init>()V
   7   areturn

So the constructor invocation via invokespecial shares the behavior of passing this as the first argument with invokevirtual.

However, it must be emphasized that a reference to an uninitialized reference is treated specially, as you are not allowed to use it before the constructor (or the super constructor when you’re inside the constructor) has been invoked. This is enforced by the verifier.

JVMS, §4.10.2.4. Instance Initialization Methods and Newly Created Objects:

… The instance initialization method (§2.9) for class myClass sees the new uninitialized object as its this argument in local variable 0. Before that method invokes another instance initialization method of myClass or its direct superclass on this, the only operation the method can perform on this is assigning fields declared within myClass.

When doing dataflow analysis on instance methods, the verifier initializes local variable 0 to contain an object of the current class, or, for instance initialization methods, local variable 0 contains a special type indicating an uninitialized object. After an appropriate instance initialization method is invoked (from the current class or its direct superclass) on this object, all occurrences of this special type on the verifier's model of the operand stack and in the local variable array are replaced by the current class type. The verifier rejects code that uses the new object before it has been initialized or that initializes the object more than once. In addition, it ensures that every normal return of the method has invoked an instance initialization method either in the class of this method or in the direct superclass.

Similarly, a special type is created and pushed on the verifier's model of the operand stack as the result of the Java Virtual Machine instruction new. The special type indicates the instruction by which the class instance was created and the type of the uninitialized class instance created. When an instance initialization method declared in the class of the uninitialized class instance is invoked on that class instance, all occurrences of the special type are replaced by the intended type of the class instance. This change in type may propagate to subsequent instructions as the dataflow analysis proceeds.

So code creating an object via the new instruction can’t use it in any way before the constructor has been called, whereas a constructor’s code can only assign fields before another (this(…) or super(…)) constructor has been called (an opportunity used by inner classes to initialize their outer instance reference as a first action), but still can’t do anything else with their uninitialized this.

It’s also not allowed for a constructor to return when this is still in the uninitialized state. Hence, the automatically generated constructor bears the required minimum, invoking the super constructor and returning (there is no implicit return on the byte code level).

It’s generally recommended to read The Java® Virtual Machine Specification (resp. its Java 11 version) alongside to any ASM specific documentation or tutorials.

like image 24
Holger Avatar answered Jan 04 '23 05:01

Holger