Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I control the order of constant pool entries using ASM?

I'm implementing a transformation that removes unused elements from .class files to reduce their size. Because some constant pool entries will become unused, I'm having ASM recompute the constant pool, rather than copying it over from the input. However, the transformed .class files are sometimes larger than the originals because ASM's constant pool ordering requires using ldc_w instructions (with a 2-byte index) where the input .class file used ldc (with a 1-byte index). I'd like to manually sort the constant pool such that constants referred to by ldc come first.

One might want to sort the constant pool for other reasons too: for example, to make a set of .class files more compressible by putting their constant pools in a canonical order, to test tools that consume .class files, to use the order as a software watermark, or to confuse poorly-implemented decompilers/deobfuscators.

I grepped through the ASM guide for "constant", but there were no useful hits besides a generic explanation of what the constant pool is and "Hopefully ASM hides all the details related to the constant pool, so you will not have to bother about it.", which is anti-helpful in this case.

How can I control the order in which ASM emits constant pool entries?

like image 513
Jeffrey Bosboom Avatar asked Mar 16 '15 14:03

Jeffrey Bosboom


1 Answers

ASM doesn't provide a clean way to do this, but it's possible if you're willing to define new classes in the org.objectweb.asm package (or use reflection to access package-private members). This is not ideal because it introduces a dependence on ASM's implementation details, but it's the best we can do. (If you know of a non-hack way to do this, please add it as another answer.)

Some things that don't work

ClassWriter exposes newConst (and variants for other constant pool entry types) to allow implementing custom attributes. Because ASM will reuse constant pool entries, you might assume you can pre-fill the constant pool in your desired order by calling newConst and friends. However, many constant pool entries reference other constant pool entries (particularly Utf8 entries, which are referenced by String and Class entries), and these methods will automatically add referenced entries if not already present. Thus it is not possible to put a String constant before the Utf8 it references, for example. These methods can be overridden, but doing so does not help because this behavior is baked into the package-private or private methods they delegate to.

This post suggests sorting ClassWriter's internal data structures in an overloaded visitEnd. This does not work for two reasons. First, visitEnd is final (perhaps it wasn't back in 2005 when that post was written). Second, ClassWriter emits the class bytes during the visitation, so by the time visitEnd is called, the constant pool is already written as bytes and constant pool indices are already baked into code bytes.

The solution

The solution requires two rounds of class writing. First we'll write the class normally (including other transformations), then use another ClassWriter with a pre-filled constant pool to parse and rewrite the result of the first round. Because ClassWriter builds the constant pool bytes as it goes, we have to do that manually before beginning the second parse and write. We'll encapsulate the second parse/write in the first ClassWriter's toByteArray method.

Here's the code. The actual sorting occurs in the sortItems method; here we're sorting by the number of occurrences as an ldc/ldc_w operand (collected by a MethodVisitor; note that visitMethod is final so it must be separate). If you want to implement a different sort, change sortItems and add fields to store whatever your sort is based on.

package org.objectweb.asm;

import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.Map;

public class ConstantPoolSortingClassWriter extends ClassWriter {
    private final int flags;
    Map<Item, Integer> constantHistogram; //initialized by ConstantHistogrammer
    public ConstantPoolSortingClassWriter(int flags) {
        super(flags);
        this.flags = flags;
    }

    @Override
    public byte[] toByteArray() {
        byte[] bytes = super.toByteArray();

        List<Item> cst = new ArrayList<>();
        for (Item i : items)
            for (Item j = i; j != null; j = j.next) {
                //exclude ASM's internal bookkeeping
                if (j.type == TYPE_NORMAL || j.type == TYPE_UNINIT ||
                        j.type == TYPE_MERGED || j.type == BSM)
                    continue;
                if (j.type == CLASS) 
                    j.intVal = 0; //for ASM's InnerClesses tracking
                cst.add(j);
            }

        sortItems(cst);

        ClassWriter target = new ClassWriter(flags);
        //ClassWriter.put is private, so we have to do the insert manually
        //we don't bother resizing the hashtable
        for (int i = 0; i < cst.size(); ++i) {
            Item item = cst.get(i);
            item.index = target.index++;
            if (item.type == LONG || item.type == DOUBLE)
                target.index++;

            int hash = item.hashCode % target.items.length;
            item.next = target.items[hash];
            target.items[hash] = item;
        }

        //because we didn't call newFooItem, we need to manually write pool bytes
        //we can call newFoo to find existing items, though
        for (Item i : cst) {
            if (i.type == UTF8)
                target.pool.putByte(UTF8).putUTF8(i.strVal1);
            if (i.type == CLASS || i.type == MTYPE || i.type == STR)
                target.pool.putByte(i.type).putShort(target.newUTF8(i.strVal1));
            if (i.type == IMETH || i.type == METH || i.type == FIELD)
                target.pool.putByte(i.type).putShort(target.newClass(i.strVal1)).putShort(target.newNameType(i.strVal2, i.strVal3));
            if (i.type == INT || i.type == FLOAT)
                target.pool.putByte(i.type).putInt(i.intVal);
            if (i.type == LONG || i.type == DOUBLE)
                target.pool.putByte(i.type).putLong(i.longVal);
            if (i.type == NAME_TYPE)
                target.pool.putByte(i.type).putShort(target.newUTF8(i.strVal1)).putShort(target.newUTF8(i.strVal2));
            if (i.type >= HANDLE_BASE && i.type < TYPE_NORMAL) {
                int tag = i.type - HANDLE_BASE;
                if (tag <= Opcodes.H_PUTSTATIC)
                    target.pool.putByte(HANDLE).putByte(tag).putShort(target.newField(i.strVal1, i.strVal2, i.strVal3));
                else
                    target.pool.putByte(HANDLE).putByte(tag).putShort(target.newMethod(i.strVal1, i.strVal2, i.strVal3, tag == Opcodes.H_INVOKEINTERFACE));
            }
            if (i.type == INDY)
                target.pool.putByte(INDY).putShort((int)i.longVal).putShort(target.newNameType(i.strVal1, i.strVal2));
        }

        //parse and rewrite with the new ClassWriter, constants presorted
        ClassReader r = new ClassReader(bytes);
        r.accept(target, 0);
        return target.toByteArray();
    }

    private void sortItems(List<Item> items) {
        items.forEach(i -> constantHistogram.putIfAbsent(i, 0));
        //constants appearing more often come first, so we use as few ldc_w as possible
        Collections.sort(items, Comparator.comparing(constantHistogram::get).reversed());
    }
}

Here's ConstantHistogrammer, which is in org.objectweb.asm so it can refer to Item. This implementation is specific to the ldc sort, but it demonstrates how to perform other custom sorts based on information from the .class file.

package org.objectweb.asm;

import java.util.HashMap;
import java.util.Map;

public final class ConstantHistogrammer extends ClassVisitor {
    private final ConstantPoolSortingClassWriter cw;
    private final Map<Item, Integer> constantHistogram = new HashMap<>();
    public ConstantHistogrammer(ConstantPoolSortingClassWriter cw) {
        super(Opcodes.ASM5, cw);
        this.cw = cw;
    }
    @Override
    public MethodVisitor visitMethod(int access, String name, String desc, String signature, String[] exceptions) {
        return new CollectLDC(super.visitMethod(access, name, desc, signature, exceptions));
    }
    @Override
    public void visitEnd() {
        cw.constantHistogram = constantHistogram;
        super.visitEnd();
    }
    private final class CollectLDC extends MethodVisitor {
        private CollectLDC(MethodVisitor mv) {
            super(Opcodes.ASM5, mv);
        }
        @Override
        public void visitLdcInsn(Object cst) {
            //we only care about things ldc can load
            if (cst instanceof Integer || cst instanceof Float || cst instanceof String ||
                    cst instanceof Type || cst instanceof Handle)
                constantHistogram.merge(cw.newConstItem(cst), 1, Integer::sum);
            super.visitLdcInsn(cst);
        }
    }
}

Finally, here's how you use them together:

byte[] inputBytes = Files.readAllBytes(input);
ClassReader cr = new ClassReader(inputBytes);
ConstantPoolSortingClassWriter cw = new ConstantPoolSortingClassWriter(0);
ConstantHistogrammer ch = new ConstantHistogrammer(cw);
ClassVisitor s = new SomeOtherClassVisitor(ch);
cr.accept(s, 0);
byte[] outputBytes = cw.toByteArray();

The transformation applied by SomeOtherClassVisitor will occur only on the first visitation, not on the second visitation inside cw.toByteArray().

There's no test suite for this, but I applied the above sort to rt.jar from Oracle JDK 8u40, and NetBeans 8.0.2 functions normally using the transformed class files, so it is at least mostly correct. (The transformation saved 12684 bytes, which is hardly worthwhile on its own.)

The code is available as a Gist, under the same license as ASM itself.

like image 123
Jeffrey Bosboom Avatar answered Oct 20 '22 01:10

Jeffrey Bosboom