Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to statically analyze reference types passed to each bytecode instruction?

I have rewritten the question (the question remains the same, just with less background noise) in hopes of creating less confusion directed at all the wrong things - due to this, some of the comments below may seem out of context.

Analyzing Java bytecode, what is the easiest way to find all the possible reference types given as parameters for a given Java bytecode instruction? I'm interested in the type of the reference, that is, that a given putfield instruction will receive an Integer, or that it might receive an Integer or a Float, etc.

For example, consider this code block:

   0:   aload_1
   1:   invokestatic    #21; //Method java/lang/Integer.valueOf:(Ljava/lang/String;)Ljava/lang/Integer;
   4:   astore_2
   5:   aload_2
   6:   ifnull  17
   9:   aload_0
   10:  aload_2
   11:  putfield    #27; //Field value:Ljava/lang/Number;
   14:  goto    25
   17:  aload_0
   18:  iconst_0
   19:  invokestatic    #29; //Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
   22:  putfield    #27; //Field value:Ljava/lang/Number;
   25:  return

We can deduce that the putfield instruction at pc 11 will receive a ref type of Integer.

0: aload pushes ref type of String (the method param)
1: invokestatic pops the ref type and pushes a ref type of Integer (invoked method return type)
4: astore pops the ref type of Integer and stores it in local variable 2
5: aload pushes the ref type of Integer from local variable 2
6: ifnull pops the ref type of Integer and conditionally jumps to pc 17
9: aload pushes "this"
10: aload pushes the ref type of Integer
11: putfield: we know we have a ref type of Integer that the instruction will put in field

Do any of the bytecode/code analysis libraries do this for me, or do I have to write this myself? The ASM project has an Analyzer, which seems like it might do part of the work for me, but really not enough to justify switching to using it.

EDIT: I have done my homework and have studied the Java VM Spec.

like image 776
Sami Koivu Avatar asked Jun 04 '11 08:06

Sami Koivu


2 Answers

The Analyzer.analyze(...) method seems to do exactly what you need, and if not you've got the option of hacking it. This would be a better approach than starting over again.

Another idea would be to see if you can find a bytecode verifier that is implemented in Java. A verifier must use data flow analysis to ensure that methods don't get called with the wrong type of parameters.

like image 154
Stephen C Avatar answered Oct 05 '22 05:10

Stephen C


I have found need to do pretty much the exact same thing on a project of mine. You might want to take a look at the source code here (in the visitEnd() method). It uses an Analyzer from the ASM project to take a 'snapshot' of the stack frame at the time of a PUTFIELD instruction. Those snapshots are then stored, and can be retrieved once the visitor has finished, part of the information contained in the snapshot is the type of reference at the top of the stack.

The particular class linked to above is designed to be subclassed, an example of a subclass is here (check out visitMethod()). At the time I needed to do this, I turned to StackOverflow too, you may want to check out the question I asked at the time, particularly the link provided in the accepted answer, which provided the basis of the code I eventually used.

like image 29
Grundlefleck Avatar answered Oct 05 '22 06:10

Grundlefleck