Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dynamic Java Bytecode Manipulation Framework Comparison

There are some frameworks out there for dynamic bytecode generation, manipulation and weaving (BCEL, CGLIB, javassist, ASM, MPS). I want to learn about them, but since I don't have much time to know all the details about all of them, I would like to see a sort of comparison chart saying the advantages and disadvantages of one versus the others and an explanation of why.

Here in SO, I found a lot of questions asking something similar, and the answers normally said "you can use cglib or ASM", or "javassist is better than cglib", or "BCEL is old and is dying" or "ASM is the best because it gives X and Y". These answers are useful, but does not fully answer the question in the scope that I want, comparing them more deeply and giving the advantages and disadvantages of each one.

like image 942
Victor Stafusa Avatar asked Feb 06 '12 21:02

Victor Stafusa


People also ask

What is the best framework for dynamic bytecode generation?

Show activity on this post. There are some frameworks out there for dynamic bytecode generation, manipulation and weaving (BCEL, CGLIB, javassist, ASM, MPS).

What is byte code manipulation?

After we looked into the class file format, then byte code manipulation is just about changing the content in the different sections of the class file after reading it. We will use the ASM library for doing the experiments, as this library is used by Spring as well.

How does bytecode analysis work?

Bytecode analysis is based on the JarScan tool. Mentioned program is part of the JITWatch system, which statically analyses jar files and counts the bytes in each method’s bytecode. After jar files scan, it produces CSV format reports. Those .csv summaries are used to process and produce final bytecode analysis results.

Does Java code have a shorter bytecode?

By looking at the results obtained from static bytecode instructions analysis, we can say that Java code overall, produces shorter bytecode. Java implementation had the least number of total instructions count in five out of six benchmarks.


2 Answers

Analysis of bytecode libraries

As I can tell from the answers you got here and ones in the questions that you have looked at, these answers do not formally address the question in the explicit manner you have stated. You asked for a comparison, meanwhile these answers have vaguely stated what one might want based on what your target is (e.g. Do you need to know bytecode? [y/n]), or are too narrow.

This answer is a short analysis of each bytecode framework, and provides a quick comparison at the end.

Javassist

  • Tiny (javassist.jar (3.21.0) is ~707KB / javassist-rel_3_22_0_cr1.zip is ~1.5MB)
  • High(/Low)-level
  • Straightforward
  • Feature-complete
  • Requires minimal to no class file format knowledge
  • Requires moderate Java instruction set knowledge
  • Minimal learning effort
  • Has some quirks in the single-line/multi-line compile-and-insert-bytecode methods

I personally prefer Javassist simply because of how quickly you can get to using it and building and manipulating classes with it. The tutorial is straightforward and easy to follow. The jar file is a tiny 707KB, so it is nice and portable; makes it suitable for standalone applications.


ASM

  • Large (asm-6.0_ALPHA-bin.zip is ~2.9MB / asm-svn-latest.tar.gz (10/15/2016) is ~41MB)
  • Low(/High)-level
  • Comprehensive
  • Feature-complete
  • Recommend a proficient knowledge of class file format
  • Requires proficiency with Java instruction set
  • Moderate learning effort (somewhat complex)

ASM by ObjectWeb is a very comprehensive library which lacks nothing related to building, generating, and loading classes. In fact, it even has class analysis tools with predefined analyzers. It is said to be the industry standard for bytecode manipulation. It is also the reason why I steer clear away from it.

When I see examples of ASM, it seems like a cumbersome beast of a task with the number of lines it takes to modify or load a class. Even some of the parameters to some methods seem a bit cryptic and out of place for Java. With things like ACC_PUBLIC, and plenty of method calls with null everywhere, it honestly does look like it is better suited for a low-level language like C. Why not simply just pass a String literal like "public", or an enum Modifier.PUBLIC? It's more friendly and easy to use. That is my opinion, however.

For reference, here is an ASM (4.0) tutorial: https://www.javacodegeeks.com/2012/02/manipulating-java-class-files-with-asm.html


BCEL

  • Small (bcel-6.0-bin.zip is 7.3MB / bcel-6.0-src.zip is 1.4MB)
  • Low-level
  • Adequate
  • Gets the job done
  • Requires proficiency with Java instruction set
  • Easy to learn

From what I have seen, this library is your basic class library that lets you do everything you need to—if you can spare a few months or years.

Here is a BCEL tutorial that really spells it out: http://www.geekyarticles.com/2011/08/manipulating-java-class-files-with-bcel.html?m=1


cglib

  • Very tiny (cglib-3.2.5.jar is 295KB/source code)
  • Depends on ASM
  • High-level
  • Feature-complete (Bytecode Generation)
  • Little or no Java bytecode knowledge needed
  • Easy to learn
  • Esoteric Library

Despite the fact that you can read information from classes, and that you can transform classes, the library seems tailored to proxies. The tutorial is all about beans for the proxies, and it even mentions it is used by "data access frameworks to generate dynamic proxy objects and intercept field access." Still, I see no reason why you can't use it for the more simple purpose of bytecode manipulation instead of proxies.


ByteBuddy

  • Small bin/"Huge" src (by comparison) (byte-buddy-dep-1.8.12.jar is ~2.72 MB / 1.8.12 (zip) is 124.537 MB (exact))
  • Depends on ASM
  • High-level
  • Feature-complete
  • Personally, a peculiar name for a Service Pattern class (ByteBuddy.class)
  • Little or no Java byte code knowledge needed
  • Easy to learn

Long story short, where BCEL is lacking, ByteBuddy is abundant. It uses a primary class called ByteBuddy using the Service Design Pattern. You create a new instance of ByteBuddy, and this represents a class that you want to modify. When you are done with your modifications, you can then make a DynamicType with make().

On their website is a full tutorial with API documentation. The purpose seems to be for rather high-level modifications. When it comes to methods, there does not appear to be anything in the official tutorial, or any 3rd party tutorial, about creating a method from scratch, apart from delegating a method (EDITME if you know where this is explained).

Their tutorial can be found here on their website. Some examples can be found here.


Java Class Assistant (jCLA)

I have my own bytecode library that I am building, which will be called Java Class Assistant, or jCLA for short, because of another project I am working on and because of said quirks with Javassist, but I will not be releasing it to GitHub until it is finished but the project is currently available to browse on GitHub and give feedback on as it is currently in alpha, but still workable enough to be a basic class library (currently working on the compilers; please help me if you can! It will be released a lot sooner!).

It will be quite bare bones with the ability to read and write class files to and from a JAR file, as well as the ability to compile and decompile bytecode to and from source code and class files.

The overall usage pattern makes it rather easy to work with jCLA, though it may take some getting used to and is apparently quite similar to ByteBuddy in its style of methods and method parameters for class modifications:

import jcla.ClassPool;
import jcla.ClassBuilder;
import jcla.ClassDefinition;
import jcla.MethodBuilder;
import jcla.FieldBuilder;

import jcla.jar.JavaArchive;

import jcla.classfile.ClassFile;

import jcla.io.ClassFileOutputStream;

public class JCLADemo {

    public static void main(String... args) {
        // get the class pool for this JVM instance
        ClassPool classes = ClassPool.getLocal();
        // get a class that is loaded in the JVM
        ClassDefinition classDefinition = classes.get("my.package.MyNumberPrinter");
        // create a class builder to modify the class
        ClassBuilder clMyNumberPrinter= new ClassBuilder(classDefinition);

        // create a new method with name printNumber
        MethodBuilder printNumber = new MethodBuilder("printNumber");
        // add access modifiers (use modifiers() for convenience)
        printNumber.modifier(Modifier.PUBLIC);
        // set return type (void)
        printNumber.returns("void");
        // add a parameter (use parameters() for convenience)
        printNumber.parameter("int", "number");
        // set the body of the method (compiled to bytecode)
        // use body(byte[]) or insert(byte[]) for bytecode
        // insert(String) also compiles to bytecode
        printNumber.body("System.out.println(\"the number is: \" + number\");");
        // add the method to the class
        // you can use method(MethodDefinition) or method(MethodBuilder)
        clMyNumberPrinter.method(printNumber.build());

        // add a field to the class
        FieldBuilder HELLO = new FieldBuilder("HELLO");
        // set the modifiers for hello; convenience method example
        HELLO.modifiers(Modifier.PRIVATE, Modifier.STATIC, Modifier.FINAL);
        // set the type of this field
        HELLO.type("java.lang.String");
        // set the actual value of this field
        // this overloaded method expects a VariableInitializer production
        HELLO.value("\"Hello from \" + getClass().getSimpleName() + \"!\"");

        // add the field to the class (same overloads as clMyNumberPrinter.method())
        clMyNumberPrinter.field(HELLO.build());

        // redefine
        classDefinition = clMyNumberPrinter.build();
        // update the class definition in the JVM's ClassPool
        // (this updates the actual JVM's loaded class)
        classes.update(classDefinition);

        // write to disk
        JavaArchive archive = new JavaArchive("myjar.jar");
        ClassFile classFile = new ClassFile(classDefinition);
        ClassFileOutputStream stream = new ClassFileOutputStream(archive);

        try {
            stream.write(classFile);
        } catch(IOException e) {
            // print to System.out
        } finally {
            stream.close();
        }
    }

}

(VariableInitializer production specification for your convenience.)

As may be implied from the above snippet, each ClassDefinition is immutable. This makes jCLA more secure, thread-safe, network-safe, and easy to use. The system revolves primarily around ClassDefinitions as the object of choice for querying information about a class in a high-level manner, and the system is built in such a way that ClassDefinition is converted to and from target types such as ClassBuilder and ClassFile.

jCLA uses a tiered system for class data. At the bottom, you have the immutable ClassFile: a struct or software representation of a class file. Then you have immutable ClassDefinitions which are converted from ClassFiles into something less cryptic and more manageable and useful to the programmer who is modifying or reading data from the class, and is comparable to information accessed through java.lang.Class. Finally, you have mutable ClassBuilders. The ClassBuilder is how classes are modified or created. It allows that you can create a ClassDefinition directly from the builder from its current state. Creating a new builder for each class is not necessary as the reset() method will clear the variables.

(Analysis of this library will be available as soon as it is ready for release.)

But until then, as of today:

  • Small (src: 227.704 KB exact, 6/2/2018)
  • Self-sufficient (no dependencies except Java's shipped library)
  • High-level
  • No required knowledge of java bytecode or class files (for tier 1 API, e.g. ClassBuilder, ClassDefinition, etc.)
  • Easy to learn (even easier if coming from ByteBuddy)

I still recommend learning about java bytecode however. It will make debugging easier.


Comparison

Considering all of these analyses (excluding jCLA for now), the broadest framework is ASM, the easiest to use is Javassist, the most basic implementation is BCEL, and the most high-level for bytecode generation and proxies is cglib.

ByteBuddy deserves its own explanation. It is easy to use like Javassist, but appears to be lacking some of the features that make Javassist great, such as method creation from scratch, so you would need to use ASM for that apparently. If you need to do some lightweight modification with classes, ByteBuddy is the way to go, but for more advanced modification of classes while maintaining a high level of abstraction, Javassist is a better choice.

Note: if I missed a Library, please edit this answer or mention it in a comment.

like image 124
AMDG Avatar answered Oct 17 '22 15:10

AMDG


If your interest in bytecode generation is only to use it, the comparison chart becomes rather simple :

Do you need to understand bytecode?

for javassist : no

for all others : yes

Of course, even with javassist you may be confronted with bytecode concepts at some point. Likewise, some of the other libraries (such as ASM) have a more high-level api and/or tool support to shield you from many of the bytecode details.

What really distinguishes javassist though, is the inclusion of a basic java compiler. This makes it very easy to write complex class transformations : you only have to put a java fragment in a String and use the library to insert it at specific points in the program. The included compiler will build the equivalent bytecode, which is then inserted into the existing classes.

like image 22
user1205938 Avatar answered Oct 17 '22 17:10

user1205938