I'm creating Java byte code instrumentation tool with the help of ASM framework, and need to determine and possibly change the type of local variables of a method. Very quickly I encountered a simple case where variables and stack map nodes look somewhat weird and don't give me enough information about variables being used:
public static void test() {
List l = new ArrayList();
for (Object i : l) {
int a = (int)i;
}
}
Gives the following bytecode(from Idea):
public static test()V
L0
LINENUMBER 42 L0
NEW java/util/ArrayList
DUP
INVOKESPECIAL java/util/ArrayList.<init> ()V
ASTORE 0
L1
LINENUMBER 43 L1
ALOAD 0
INVOKEINTERFACE java/util/List.iterator ()Ljava/util/Iterator;
ASTORE 1
L2
FRAME APPEND [java/util/List java/util/Iterator]
ALOAD 1
INVOKEINTERFACE java/util/Iterator.hasNext ()Z
IFEQ L3
ALOAD 1
INVOKEINTERFACE java/util/Iterator.next ()Ljava/lang/Object;
ASTORE 2
L4
LINENUMBER 44 L4
ALOAD 2
CHECKCAST java/lang/Integer
INVOKEVIRTUAL java/lang/Integer.intValue ()I
ISTORE 3
L5
LINENUMBER 45 L5
GOTO L2
L3
LINENUMBER 46 L3
FRAME CHOP 1
RETURN
L6
LOCALVARIABLE i Ljava/lang/Object; L4 L5 2
LOCALVARIABLE l Ljava/util/List; L1 L6 0
MAXSTACK = 2
MAXLOCALS = 4
As one can see, all 4 explicitly and implicitly defined vars take 1 slot, 4 slots are reserved, but only 2 defined, in strange order (address 2 before address 0) and with a "hole" between them. List iterator is later written to this "hole" with ASTORE 1 without declaring the type of this variable first. Only after this operation stack map frame appears but it is unclear to me why only 2 variables are put into it, because later more than 2 are used. Later, with ISTORE 3, int is written into a variable slot again, without any declaration.
At this point it looks like I need to ignore variable definitions altogether, and infer all types by interpreting the bytecode, running the simulation of JVM stack.
Tried ASM EXPAND_FRAME option, but it is is useless, only changing the type of the single frame node to F_NEW with the rest still seen exactly as before.
Can anybody explain why do I see such a strange code and if I have other options beyond writing my own JVM intepreter?
Conclusion, based on all the answers(please correct me again if I'm wrong):
Variable definitions are only for matching source variable names/types to specific variable slots accessed at specific lines of code, apparently ignored by JVM class verifier and during code execution. Can be absent or don't match the actual bytecode.
Variable slots are treated like another stack, albeit accessed via 32-bit word indices, and it is always possible to overwrite its contents with different temporaries as long as you use matching types of load and store instructions.
Stack frame nodes contain the list of variables allocated from the beginning of the variable frame to the last variable that is going to be loaded in the subsequent code without storing first. This allocation map is expected to be the same regardless of what execution path was taken to reach its label. They also contain similar map for the operand stack as well. Their contents may be specified as increments relative to the preceding stack frame node.
Variables that only exist within linear sequences of code will only appear in the stack frame node if there are variables with longer lifetime allocated at higher slot address.
LocalVariableTable
is for matching variables in the source code to variable slots in the method bytecode. This optional attribute is mostly for debuggers (to print the correct name of a variable).
As you've already answered yourself, in order to infer local variable type or an expression type you have to iterate through the bytecode: either from the method beginning or from the nearest stack map. StackMapTable
attribute contains the stack maps only at the merge points.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With