I'm implementing a transformation that removes unused elements from .class files to reduce their size. Because some constant pool entries will become unused, I'm having ASM recompute the constant pool, rather than copying it over from the input. However, the transformed .class files are sometimes larger than the originals because ASM's constant pool ordering requires using ldc_w
instructions (with a 2-byte index) where the input .class file used ldc
(with a 1-byte index). I'd like to manually sort the constant pool such that constants referred to by ldc
come first.
One might want to sort the constant pool for other reasons too: for example, to make a set of .class files more compressible by putting their constant pools in a canonical order, to test tools that consume .class files, to use the order as a software watermark, or to confuse poorly-implemented decompilers/deobfuscators.
I grepped through the ASM guide for "constant", but there were no useful hits besides a generic explanation of what the constant pool is and "Hopefully ASM hides all the details related to the constant pool, so you will not have to bother about it.", which is anti-helpful in this case.
How can I control the order in which ASM emits constant pool entries?
ASM doesn't provide a clean way to do this, but it's possible if you're willing to define new classes in the org.objectweb.asm
package (or use reflection to access package-private members). This is not ideal because it introduces a dependence on ASM's implementation details, but it's the best we can do. (If you know of a non-hack way to do this, please add it as another answer.)
ClassWriter exposes newConst
(and variants for other constant pool entry types) to allow implementing custom attributes. Because ASM will reuse constant pool entries, you might assume you can pre-fill the constant pool in your desired order by calling newConst
and friends. However, many constant pool entries reference other constant pool entries (particularly Utf8 entries, which are referenced by String and Class entries), and these methods will automatically add referenced entries if not already present. Thus it is not possible to put a String constant before the Utf8 it references, for example. These methods can be overridden, but doing so does not help because this behavior is baked into the package-private or private methods they delegate to.
This post suggests sorting ClassWriter's internal data structures in an overloaded visitEnd
. This does not work for two reasons. First, visitEnd
is final (perhaps it wasn't back in 2005 when that post was written). Second, ClassWriter emits the class bytes during the visitation, so by the time visitEnd
is called, the constant pool is already written as bytes and constant pool indices are already baked into code bytes.
The solution requires two rounds of class writing. First we'll write the class normally (including other transformations), then use another ClassWriter with a pre-filled constant pool to parse and rewrite the result of the first round. Because ClassWriter builds the constant pool bytes as it goes, we have to do that manually before beginning the second parse and write. We'll encapsulate the second parse/write in the first ClassWriter's toByteArray
method.
Here's the code. The actual sorting occurs in the sortItems
method; here we're sorting by the number of occurrences as an ldc
/ldc_w
operand (collected by a MethodVisitor; note that visitMethod
is final so it must be separate). If you want to implement a different sort, change sortItems
and add fields to store whatever your sort is based on.
package org.objectweb.asm;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.Map;
public class ConstantPoolSortingClassWriter extends ClassWriter {
private final int flags;
Map<Item, Integer> constantHistogram; //initialized by ConstantHistogrammer
public ConstantPoolSortingClassWriter(int flags) {
super(flags);
this.flags = flags;
}
@Override
public byte[] toByteArray() {
byte[] bytes = super.toByteArray();
List<Item> cst = new ArrayList<>();
for (Item i : items)
for (Item j = i; j != null; j = j.next) {
//exclude ASM's internal bookkeeping
if (j.type == TYPE_NORMAL || j.type == TYPE_UNINIT ||
j.type == TYPE_MERGED || j.type == BSM)
continue;
if (j.type == CLASS)
j.intVal = 0; //for ASM's InnerClesses tracking
cst.add(j);
}
sortItems(cst);
ClassWriter target = new ClassWriter(flags);
//ClassWriter.put is private, so we have to do the insert manually
//we don't bother resizing the hashtable
for (int i = 0; i < cst.size(); ++i) {
Item item = cst.get(i);
item.index = target.index++;
if (item.type == LONG || item.type == DOUBLE)
target.index++;
int hash = item.hashCode % target.items.length;
item.next = target.items[hash];
target.items[hash] = item;
}
//because we didn't call newFooItem, we need to manually write pool bytes
//we can call newFoo to find existing items, though
for (Item i : cst) {
if (i.type == UTF8)
target.pool.putByte(UTF8).putUTF8(i.strVal1);
if (i.type == CLASS || i.type == MTYPE || i.type == STR)
target.pool.putByte(i.type).putShort(target.newUTF8(i.strVal1));
if (i.type == IMETH || i.type == METH || i.type == FIELD)
target.pool.putByte(i.type).putShort(target.newClass(i.strVal1)).putShort(target.newNameType(i.strVal2, i.strVal3));
if (i.type == INT || i.type == FLOAT)
target.pool.putByte(i.type).putInt(i.intVal);
if (i.type == LONG || i.type == DOUBLE)
target.pool.putByte(i.type).putLong(i.longVal);
if (i.type == NAME_TYPE)
target.pool.putByte(i.type).putShort(target.newUTF8(i.strVal1)).putShort(target.newUTF8(i.strVal2));
if (i.type >= HANDLE_BASE && i.type < TYPE_NORMAL) {
int tag = i.type - HANDLE_BASE;
if (tag <= Opcodes.H_PUTSTATIC)
target.pool.putByte(HANDLE).putByte(tag).putShort(target.newField(i.strVal1, i.strVal2, i.strVal3));
else
target.pool.putByte(HANDLE).putByte(tag).putShort(target.newMethod(i.strVal1, i.strVal2, i.strVal3, tag == Opcodes.H_INVOKEINTERFACE));
}
if (i.type == INDY)
target.pool.putByte(INDY).putShort((int)i.longVal).putShort(target.newNameType(i.strVal1, i.strVal2));
}
//parse and rewrite with the new ClassWriter, constants presorted
ClassReader r = new ClassReader(bytes);
r.accept(target, 0);
return target.toByteArray();
}
private void sortItems(List<Item> items) {
items.forEach(i -> constantHistogram.putIfAbsent(i, 0));
//constants appearing more often come first, so we use as few ldc_w as possible
Collections.sort(items, Comparator.comparing(constantHistogram::get).reversed());
}
}
Here's ConstantHistogrammer, which is in org.objectweb.asm
so it can refer to Item
. This implementation is specific to the ldc
sort, but it demonstrates how to perform other custom sorts based on information from the .class file.
package org.objectweb.asm;
import java.util.HashMap;
import java.util.Map;
public final class ConstantHistogrammer extends ClassVisitor {
private final ConstantPoolSortingClassWriter cw;
private final Map<Item, Integer> constantHistogram = new HashMap<>();
public ConstantHistogrammer(ConstantPoolSortingClassWriter cw) {
super(Opcodes.ASM5, cw);
this.cw = cw;
}
@Override
public MethodVisitor visitMethod(int access, String name, String desc, String signature, String[] exceptions) {
return new CollectLDC(super.visitMethod(access, name, desc, signature, exceptions));
}
@Override
public void visitEnd() {
cw.constantHistogram = constantHistogram;
super.visitEnd();
}
private final class CollectLDC extends MethodVisitor {
private CollectLDC(MethodVisitor mv) {
super(Opcodes.ASM5, mv);
}
@Override
public void visitLdcInsn(Object cst) {
//we only care about things ldc can load
if (cst instanceof Integer || cst instanceof Float || cst instanceof String ||
cst instanceof Type || cst instanceof Handle)
constantHistogram.merge(cw.newConstItem(cst), 1, Integer::sum);
super.visitLdcInsn(cst);
}
}
}
Finally, here's how you use them together:
byte[] inputBytes = Files.readAllBytes(input);
ClassReader cr = new ClassReader(inputBytes);
ConstantPoolSortingClassWriter cw = new ConstantPoolSortingClassWriter(0);
ConstantHistogrammer ch = new ConstantHistogrammer(cw);
ClassVisitor s = new SomeOtherClassVisitor(ch);
cr.accept(s, 0);
byte[] outputBytes = cw.toByteArray();
The transformation applied by SomeOtherClassVisitor
will occur only on the first visitation, not on the second visitation inside cw.toByteArray()
.
There's no test suite for this, but I applied the above sort to rt.jar
from Oracle JDK 8u40, and NetBeans 8.0.2 functions normally using the transformed class files, so it is at least mostly correct. (The transformation saved 12684 bytes, which is hardly worthwhile on its own.)
The code is available as a Gist, under the same license as ASM itself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With