Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Java compiler generates weird local vars & stack map frames and how can I use them to reliably determine variable types?

I'm creating Java byte code instrumentation tool with the help of ASM framework, and need to determine and possibly change the type of local variables of a method. Very quickly I encountered a simple case where variables and stack map nodes look somewhat weird and don't give me enough information about variables being used:

public static void test() {
    List l = new ArrayList();
    for (Object i : l) {
        int a = (int)i;
    }
}

Gives the following bytecode(from Idea):

public static test()V
   L0
    LINENUMBER 42 L0
    NEW java/util/ArrayList
    DUP
    INVOKESPECIAL java/util/ArrayList.<init> ()V
    ASTORE 0
   L1
    LINENUMBER 43 L1
    ALOAD 0
    INVOKEINTERFACE java/util/List.iterator ()Ljava/util/Iterator;
    ASTORE 1
   L2
   FRAME APPEND [java/util/List java/util/Iterator]
    ALOAD 1
    INVOKEINTERFACE java/util/Iterator.hasNext ()Z
    IFEQ L3
    ALOAD 1
    INVOKEINTERFACE java/util/Iterator.next ()Ljava/lang/Object;
    ASTORE 2
   L4
    LINENUMBER 44 L4
    ALOAD 2
    CHECKCAST java/lang/Integer
    INVOKEVIRTUAL java/lang/Integer.intValue ()I
    ISTORE 3
   L5
    LINENUMBER 45 L5
    GOTO L2
   L3
    LINENUMBER 46 L3
   FRAME CHOP 1
    RETURN
   L6
    LOCALVARIABLE i Ljava/lang/Object; L4 L5 2
    LOCALVARIABLE l Ljava/util/List; L1 L6 0
    MAXSTACK = 2
    MAXLOCALS = 4

As one can see, all 4 explicitly and implicitly defined vars take 1 slot, 4 slots are reserved, but only 2 defined, in strange order (address 2 before address 0) and with a "hole" between them. List iterator is later written to this "hole" with ASTORE 1 without declaring the type of this variable first. Only after this operation stack map frame appears but it is unclear to me why only 2 variables are put into it, because later more than 2 are used. Later, with ISTORE 3, int is written into a variable slot again, without any declaration.

At this point it looks like I need to ignore variable definitions altogether, and infer all types by interpreting the bytecode, running the simulation of JVM stack.

Tried ASM EXPAND_FRAME option, but it is is useless, only changing the type of the single frame node to F_NEW with the rest still seen exactly as before.

Can anybody explain why do I see such a strange code and if I have other options beyond writing my own JVM intepreter?

Conclusion, based on all the answers(please correct me again if I'm wrong):

Variable definitions are only for matching source variable names/types to specific variable slots accessed at specific lines of code, apparently ignored by JVM class verifier and during code execution. Can be absent or don't match the actual bytecode.

Variable slots are treated like another stack, albeit accessed via 32-bit word indices, and it is always possible to overwrite its contents with different temporaries as long as you use matching types of load and store instructions.

Stack frame nodes contain the list of variables allocated from the beginning of the variable frame to the last variable that is going to be loaded in the subsequent code without storing first. This allocation map is expected to be the same regardless of what execution path was taken to reach its label. They also contain similar map for the operand stack as well. Their contents may be specified as increments relative to the preceding stack frame node.

Variables that only exist within linear sequences of code will only appear in the stack frame node if there are variables with longer lifetime allocated at higher slot address.

like image 375
noop Avatar asked Dec 18 '22 02:12

noop


1 Answers

LocalVariableTable is for matching variables in the source code to variable slots in the method bytecode. This optional attribute is mostly for debuggers (to print the correct name of a variable).

As you've already answered yourself, in order to infer local variable type or an expression type you have to iterate through the bytecode: either from the method beginning or from the nearest stack map. StackMapTable attribute contains the stack maps only at the merge points.

like image 90
apangin Avatar answered Jan 04 '23 22:01

apangin